SlideShare a Scribd company logo
MySQL
Indexes,
Histograms,
And other ways
To Speed Up Your Queries
Dave Stokes @Stoker david.stokes @Oracle.com
Community Manager, North America
MySQL Community Team
Copyright © 2019 Oracle and/or its affiliates.
The following is intended to outline our general product direction. It is intended for information
purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any
material, code, or functionality, and should not be relied upon in making purchasing decisions. The
development, release, timing, and pricing of any features or functionality described for Oracle’s
products may change and remains at the sole discretion of Oracle Corporation.
Statements in this presentation relating to Oracle’s future plans, expectations, beliefs, intentions and
prospects are “forward-looking statements” and are subject to material risks and uncertainties. A
detailed discussion of these factors and other risks that affect our business is contained in Oracle’s
Securities and Exchange Commission (SEC) filings, including our most recent reports on Form 10-K
and Form 10-Q under the heading “Risk Factors.” These filings are available on the SEC’s website
or on Oracle’s website at http://www.oracle.com/investor. All information in this presentation is
current as of September 2019 and Oracle undertakes no duty to update any statement in light of
new information or future events.
Safe Harbor
Copyright © 2019 Oracle and/or its affiliates.
Warning!
MySQL 5.6 End of Life is
February of 2021!! 3
WAS
Test Drive MySQL
Database Service For
Free Today
Get $300 in credits
and try MySQL Database Service
free for 30 days.
https://www.oracle.com/cloud/free/
Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted
4
What Is This Session About?
Nobody ever complains that the database is too fast!
Speeding up queries is not a ‘dark art’
But understanding how to speed up queries is often
treated as magic
So we will be looking at the proper use of indexes,
histograms, locking options, and some other ways to speed
queries up.
5
Yes, this is a very dry subject!
● How dry?
○ Very dry
● Lots of text on screen
○ Download slides and use as a reference later
■ slideshare.net/davestokes
○ Do not try to absorb all at once
■ Work on most frequent query first, then second most, …
■ And optimizations may need to change over time
● No Coverage today of
○ System configuration
■ OS
■ MySQL
○ Hardware
○ Networking/Cloud Slides are posted to https://slideshare.net/davestokes 6
Normalize Your Data (also not covered today)
● Can not build a skyscraper on a foundation of sand
● Third normal form or better
○ Use JSON for ‘stub’ table data, avoiding
repeated unneeded index/table accesses
● Think about how you will use your data
○ Do not use a fork to eat soup; How do you consume your data
Badly normalized data will hurt the performance of your queries. No matter
how much training you give it, a dachshund will not be faster than a thoroughbred
horse!
7
The Optimizer
● Consider the optimizer the brain and
nervous system of the system.
● Query optimization is a feature of
many relational database
management systems.
● The query optimizer attempts to
determine the most efficient way to
execute a given query by considering
the possible query plans. Wikipedia
8
One of the hardest problems in query optimization is to
accurately estimate the costs of alternative query plans.
Optimizers cost query plans using a mathematical model
of query execution costs that relies heavily on estimates
of the cardinality, or number of tuples, flowing through
each edge in a query plan.
Cardinality estimation in turn depends on estimates of
the selection factor of predicates in the query.
Traditionally, database systems estimate selectivities
through fairly detailed statistics on the distribution of
values in each column, such as histograms.
The query optimizer evaluates the options
● The optimizer wants to get your data the
cheapest way (least amount of very
expensive disk reads) possible.
● Like a GPS, the cost is built on historical
statistics. And these statistics can change
while the optimizer is working. So like a
traffic jam, washed out road, or other
traffic problem, the optimizer may be
making poor decisions for the present
situation.
● The final determination from the optimizer
is call the query plan.
● MySQL wants to optimize each query
every time it sees it – there is no locking
down the query plan like Oracle.
{watch for optimizer hints later in this
presentation}
You will see how to obtain a query plan
later in this presentation.
9
120!
If your query has five joins then the optimizer
may have to evaluate 120 different options
5!
(5 * 4 * 3 * 2 * 1)
10
EXPLAIN
Explaining EXPLAIN requires a great deal of explanation
11
EXPLAIN Syntax
12
Query optimization is covered in
Chapter 8 of the MySQL Manual
EXPLAIN reports how the server
would process the statement,
including information about how
tables are joined and in which
order.
But now for something completely different
The tools for looking at queries
● EXPLAIN
○ EXPLAIN FORMAT=
■ JSON
■ TREE
○ ANALYZE
● VISUAL EXPLAIN
13
EXPLAIN Example
14
EXPLAIN is used to obtain a query execution plan (an explanation of how MySQL would
execute a query) and should be considered an ESTIMATE as it does not run the query.
QUERY
QUERY PLAN
DETAILS
VISUAL EXPLAIN from MySQL Workbench
15
EXPLAIN FORMAT=TREE
EXPLAIN FORMAT=TREE SELECT * FROM City
JOIN Country ON (City.Population = Country.Population);
-> Inner hash join (city.Population = country.Population) (cost=100154.05 rows=100093)
-> Table scan on City (cost=0.30 rows=4188)
-> Hash
-> Table scan on Country (cost=29.90 rows=239)
16
EXPLAIN FORMAT=JSON
EXPLAIN FORMAT=JSON SELECT * FROM City JOIN Country ON
(City.Population = Country.Population)G
*************************** 1. row
***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "100154.05"
},
"nested_loop": [
{
"table": {
"table_name": "Country",
"access_type": "ALL",
"rows_examined_per_scan": 239,
"rows_produced_per_join": 239,
"filtered": "100.00",
"cost_info": {
"read_cost": "6.00",
"eval_cost": "23.90",
"prefix_cost": "29.90",
"data_read_per_join": "61K"
},
17
"used_columns": [
"Code",
"Name",
"Continent",
"Region",
"SurfaceArea",
"IndepYear",
"Population",
"LifeExpectancy",
"GNP",
"GNPOld",
"LocalName",
"GovernmentForm",
"HeadOfState",
"Capital",
"Code2"
]
}
},
{
"table": {
"table_name": "City",
"access_type": "ALL",
"rows_examined_per_scan": 4188,
"rows_produced_per_join": 100093,
"filtered": "10.00",
"using_join_buffer": "hash join",
"cost_info": {
"read_cost": "30.95",
"eval_cost": "10009.32",
"prefix_cost": "100154.05",
"data_read_per_join": "6M"
},
"used_columns": [
"ID",
"Name",
"CountryCode",
"District",
"Population"
],
"attached_condition": "(`world`.`city`.`Population` =
`world`.`country`.`Population`)"
}
}
]
}
}
EXPLAIN ANALYZE – MySQL 8.0.18
18
EXPLAIN ANALYZE SELECT * FROM City WHERE CountryCode = 'GBR'G
*************************** 1. row ***************************
EXPLAIN: -> Index lookup on City using CountryCode (CountryCode='GBR')
(cost=80.76 rows=81) ( actual time=0.153..0.178 rows=81 loops=1)
1 row in set (0.0008 sec)
MySQL 8.0.18 introduced EXPLAIN ANALYZE, which runs the query and produces EXPLAIN
output along with timing and additional, iterator-based, information about how the optimizer's
expectations matched the actual execution. Real numbers not an estimate!
More on using
EXPLAIN …
later 19
Indexes
20
Indexes
A database index is a data structure that improves the speed of data retrieval
operations on a database table at the cost of additional writes and storage space
to maintain the index data structure.
Indexes are used to quickly locate data without having to search every row in a
database table every time a database table is accessed.
Indexes can be created using one or more columns of a database table, providing
the basis for both rapid random lookups and efficient access of ordered records.
-- https://en.wikipedia.org/wiki/Database_index
21
Think of an Index as a Table
With Shortcuts to another
table!
Or a model of some of your data in it’s own table!
And the more tables you have the read to get to
the data the slower things run!
(And the more memory is taken up) 22
Many Types of Indexes
24
25
clustered index
InnoDB table storage is organized based on the values of the primary key columns, to speed up queries and sorts involving
the primary key columns.
For best performance, choose the primary key columns carefully based on the most performance-critical queries.
Because modifying the columns of the clustered index is an expensive operation, choose primary columns that are rarely or
never updated.
In the Oracle Database product, this type of table is known as an index-organized table
Think of an index as a small, fast table pointing to a bigger table
26
Index
3
6
12
27
5001
Indexed Column column 1 column x column n
5001
3
6
27
12
Mental Image
Clustered index -- stored by order of primary key
27
PRIMARY KEY column 1 column x column n
3
6
12
27
5001
Mental Image for innodb
Secondary Index
Indexes other than the clustered index are known as secondary indexes.
In InnoDB, each record in a secondary index contains the primary key columns for
the row, as well as the columns specified for the secondary index.
InnoDB uses this primary key value to search for the row in the clustered index.
If the primary key is long, the secondary indexes use more space, so it is
advantageous to have a short primary key.
28
29
Creating a table with a PRIMARY KEY (index)
CREATE TABLE t1 (
c1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
c2 VARCHAR(100),
c3 VARCHAR(100) );
An index is a list of KEYs.
And you will hear ‘key’ and ‘index’ used interchangeably
PRIMARY KEY
This is an key for the index that uniquely defined for a row, should be immutable.
InnoDB needs a PRIMARY KEY (and will make one up if you do not specify)
No Null Values
Monotonically increasing
- use UUID_To_BIN() if you must use UUIDS, otherwise avoid them
30
Indexing on a prefix of a column
CREATE INDEX part_of_name ON customer (name(10));
Only the first 10 characters are indexed in this examples and
this can save space/speed
31
Multi-column index - last_name, first_name
CREATE TABLE test (
id INT NOT NULL,
last_name CHAR(30) NOT NULL,
first_name CHAR(30) NOT NULL,
PRIMARY KEY (id),
INDEX name (last_name,first_name)
);
32
This index will work on (last_name,first_name) and (last_name) but not (first_name)
{Use: left to right}
Put highest cardinality field first
Hashing values - an alternative
SELECT * FROM tbl_name
WHERE hash_col=MD5(CONCAT(val1,val2))
AND col1=val1 AND col2=val2;
As an alternative to a composite index, you can introduce a
column that is “hashed” based on information from other columns.
If this column is short, reasonably unique, and indexed, it
might be faster than a “wide” index on many columns.
33
Some other types of indexes
Unique Indexes - only one row per value
Full-Text Indexes - search string data
Covering index - includes all columns needed for a query
Secondary index - (see next page)
Spatial Indexes - geographical information
34
Functional indexes work with only part of the data in a column or a derived value based on a value
of the column.
There are other options such as generated columns or storing a computed value in a column and
then indexing those values.
SQL > CREATE TABLE birthdays (id int NOT NULL primary key,
name char(30) NOT NULL,
birthday date NOT NULL,
INDEX ((MONTH(birthday)))
);
SQL> INSERT INTO birthdays VALUES (1,'Mac','2020-12-15');
SQL > EXPLAIN FORMAT=tree SELECT * FROM birthdays where MONTH(birthday) = 12 G
*************************** 1. row ***************************
EXPLAIN: -> Index lookup on birthdays using functional_index (month(birthday)=12) (cost=0.35 rows=1)
35
More Functional Indexes
SQL> CREATE INDEX total_idx ON product ((cost + shipping));
SQL > explain format=tree select id,
catalog_nbr,
cost,
shipping,
(cost + shipping)
FROM product
where (cost + shipping) < 100.0G
*************************** 1. row ***************************
EXPLAIN: -> Filter: ((cost + shipping) < 100.0) (cost=1.16 rows=2)
-> Index range scan on product using total_idx (cost=1.16 rows=2)
36
More Functional Indexes -- Order matters
SQL> CREATE INDEX
total_idx ON product ((cost + shipping));
SQL > explain format=tree select id,
catalog_nbr,
cost,
shipping,
(cost + shipping)
FROM product
where (cost + shipping) < 100.0G
*************************** 1. row ***************************
EXPLAIN: -> Filter: ((cost + shipping) < 100.0) (cost=1.16
rows=2)
-> Index range scan on product using total_idx (cost=1.16
rows=2)
37
SQL > explain format=tree select
id,
catalog_nbr,
cost,
shipping,
(cost + shipping)
FROM product
where (shipping + cost) < 100.0G
*************************** 1. row ***************************
EXPLAIN: -> Filter: ((product.shipping + product.cost) <
100.0) (cost=0.55 rows=3)
-> Table scan on product (cost=0.55 rows=3)
Using Index Not using Index!
Multi-value indexes
You can now have more index pointers than index keys!
○ Very useful for JSON arrays
mysql> SELECT 3 MEMBER OF('[1, 3, 5, 7, "Moe"]');
+--------------------------------------+
| 3 MEMBER OF('[1, 3, 5, 7, "Moe"]') |
+--------------------------------------+
| 1 |
+--------------------------------------+
38
Big Rule #1
Always have a primary key
on your InnoDB tables!
Always! 39
InnoDB will pick an index if you do not!
And it will not be what you need.
40
How to find InnoDB tables without indexes
SELECT tables.table_schema , tables.table_name , tables.engine
FROM information_schema.tables LEFT JOIN (
SELECT table_schema , table_name
FROM information_schema.statistics
GROUP BY table_schema, table_name, index_name
HAVING SUM(
case when non_unique = 0 and nullable != 'YES' then 1 else 0 end ) = count(*) ) puks
ON tables.table_schema = puks.table_schema
AND tables.table_name = puks.table_name
WHERE puks.table_name IS null
AND tables.table_type = 'BASE TABLE'
AND Engine="InnoDB";
41
What if you need to add an index but can’t change Code?
ALTER TABLE t1 ADD COLUMN k INT INVISIBLE PRIMARY KEY;
This was you do not need to update queries in
your code.
42
MySQL has Two Main Types of Index Structures
B-Tree is a self-balancing tree data structure
that maintains sorted data and allows searches,
sequential access, insertions, and deletions.
43
Hash are more efficient than nested loops joins,
except when the probe side of the join is very
small but joins can only be used to compute
equijoins.
Please Keep in mind ...
If there is a choice between multiple indexes, MySQL normally uses the index that
finds the smallest number of rows (the most selective index).
MySQL can use indexes on columns more efficiently if they are declared as the
same type and size.
● In this context, VARCHAR and CHAR are considered the same if they are
declared as the same size. For example, VARCHAR(10) and CHAR(10) are
the same size, but VARCHAR(10) and CHAR(15) are not.
For comparisons between nonbinary string columns, both columns should use the
same character set. For example, comparing a utf8 column with a latin1 column
precludes use of an index. 44
NULL
A few words on something that may not be there!
45
NULL
● NULL is used to designate a
LACK of data.
False 0
True 1
Don’t know NULL
46
Indexing NULL values
really drives down the
performance of
INDEXes -
Cardinal Values and then a ‘junk drawer’ of nulls -> hard to search
47
Sometimes you can
create an index that
covers just the right
columns so you do not
dive into the data!
48
Report on salesperson’s activity by customer
> desc orders;
+--------------+-----------------+------+-----+-------------------+-------------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-----------------+------+-----+-------------------+-------------------+
| id | bigint unsigned | NO | PRI | NULL | auto_increment |
| customer_id | int unsigned | YES | MUL | NULL | |
| total | decimal(10,2) | YES | | NULL | |
| sales_id | int unsigned | YES | | NULL | |
| order_placed | datetime | YES | | CURRENT_TIMESTAMP | DEFAULT_GENERATED |
+--------------+-----------------+------+-----+-------------------+-------------------+
5 rows in set (0.0029 sec)
select customer_id,
total,
sales_id
from orders
where customer_id = 101 and sales_id = 7 and total > 1.0 ; 49
Report on salesperson’s activity by customer
> desc orders;
+--------------+-----------------+------+-----+-------------------+-------------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-----------------+------+-----+-------------------+-------------------+
| id | bigint unsigned | NO | PRI | NULL | auto_increment |
| customer_id | int unsigned | YES | MUL | NULL | |
| total | decimal(10,2) | YES | | NULL | |
| sales_id | int unsigned | YES | | NULL | |
| order_placed | datetime | YES | | CURRENT_TIMESTAMP | DEFAULT_GENERATED |
+--------------+-----------------+------+-----+-------------------+-------------------+
5 rows in set (0.0029 sec)
select customer_id,
total,
sales_id
from orders
where customer_id = 101 and sales_id = 7 and total > 1.0 ; 50
Report on salesperson’s activity by customer
> desc orders;
+--------------+-----------------+------+-----+-------------------+-------------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-----------------+------+-----+-------------------+-------------------+
| id | bigint unsigned | NO | PRI | NULL | auto_increment |
| customer_id | int unsigned | YES | MUL | NULL | |
| total | decimal(10,2) | YES | | NULL | |
| sales_id | int unsigned | YES | | NULL | |
| order_placed | datetime | YES | | CURRENT_TIMESTAMP | DEFAULT_GENERATED |
+--------------+-----------------+------+-----+-------------------+-------------------+
5 rows in set (0.0029 sec)
select customer_id,
total,
sales_id
from orders
where customer_id = 101 and sales_id = 7 and total > 1.0 ; 51
Indexing customer_id and sales_id
52
alter table order add index c2 (customer_id, sales_id);
Query cost drops from 1.5 to 0.7 by indexing two of the three columns, ~ 46% better
(Note this is on a very small data set, larger data take more time)
Indexing customer_id, sales_id, and total!
53
alter table order add index c3 (customer_id, sales_id, total);
Query cost drops from 0.7 to 0.66 by indexing all three of the columns in the
query, ~ 6% over previous (44% overall)
This is done by an range scan on the index only and not the table itself!
Invisible
Indexes
54
Before Invisible Indexes
1. Doubt usefulness of index
2. Check using EXPLAIN
3. Remove that Index
4. Rerun EXPLAIN
5. Get phone/text/screams from power user about slow query
6. Suddenly realize that the index in question may have had no use for you but
the rest of the planet seems to need that dang index!
7. Take seconds/minutes/hours/days/weeks rebuilding that index
55
After Invisible Indexes
1. Doubt usefulness of index
2. Check using EXPLAIN
3. Make index invisible – optimizer can not see that index!
4. Rerun EXPLAIN
5. Get phone/text/screams from power user about slow query
6. Make index visible
7. Blame problem on { network | JavaScript | GDPR | Slack | Cloud }
Sys schema will show you which indexes have not been used that may be
candidates for removal -- but be cautious!
56
How to use INVISIBLE INDEX
ALTER TABLE t1 ALTER INDEX i_idx INVISIBLE;
ALTER TABLE t1 ALTER INDEX i_idx VISIBLE;
57
Index Extensions (more harping on secondary indexes)
InnoDB automatically extends each secondary index by appending the primary key columns to it. Consider this table definition:
CREATE TABLE t1 (
i1 INT NOT NULL DEFAULT 0,
i2 INT NOT NULL DEFAULT 0,
d DATE DEFAULT NULL,
PRIMARY KEY (i1, i2),
INDEX k_d (d)
) ENGINE = InnoDB;
This table defines the primary key on columns (i1, i2). It also defines a secondary index k_d on column (d),
but internally InnoDB extends this index and treats it as columns (d, i1, i2).
The optimizer takes into account the primary key columns of the extended secondary index when determining
how and whether to use that index. This can result in more efficient query execution plans and better
performance.
The optimizer can use extended secondary indexes for ref, range, and index_merge index access, for
Loose Index Scan access, for join and sorting optimization, and for MIN()/MAX() optimization.
58
Remove unused indexes to save memory *
59
* Make sure you
check this several
hours (or days) after
a reboot just in case
you find a rarely
used but important
index!!
Remove unused indexes to save memory *
SELECT object_schema as 'schema',
object_name,
index_name
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE
(object_schema != 'mysql' and object_schema != 'performance_schema')
AND index_name IS NOT NULL
AND count_star = 0
ORDER BY object_schema, object_name, index_name;
60
Monitor tables with FULL SCANs -> try to reduce
May be a
legitimate FULL
SCAN or a lack of
a good index.
Yes, sometimes
you have to read
the full table!
61
Histograms
What is a histogram?
It is not a gluten free, keto friendly biscuit.
62
Histograms
63
Histograms
What is a histogram?
Wikipedia declares a histogram is an accurate representation of the distribution of
numerical data. For RDBMS, a histogram is an approximation of the data
distribution within a specific column.
So in MySQL, histograms help the optimizer to find the most efficient Query Plan
to fetch that data.
64
Histograms
What is a histogram?
A histogram is a distribution of data into logical buckets.
There are two types:
● Singleton
● Equi-Height
Maximum number of buckets is 1024
65
Histogram
Histogram statistics are useful primarily for non-indexed columns.
A histogram is created or updated only on demand, so it adds no overhead
when table data is modified. On the other hand, the statistics become
progressively more out of date when table modifications occur, until the next time
they are updated.
66
The two reasons for why you might consider a histogram instead of an index:
Maintaining an indexes have a cost. If you have an index, every
● Every INSERT/UPDATE/DELETE causes the index to be updated. This will
have an impact on your performance. A histogram on the other hand is
created once and never updated unless you explicitly ask for it. It will thus not
hurt your INSERT/UPDATE/DELETE-performance.
● The optimizer will make “index dives” to estimate the number of records in a
given range. This might become too costly if you have for instance very long
IN-lists in your query. Histogram statistics are much cheaper in this case, and
might thus be more suitable.
67
The Optimizer
Occasionally the query optimizer fails to find the most efficient plan and ends up
spending a lot more time executing the query than necessary.
The optimizer assumes that the data is evenly distributed in the column. This can
be the old ‘assume’ makes an ‘ass out of you and me’ joke brought to life.
The main reason for this is often that the optimizer doesn’t have enough
knowledge about the data it is about to query:
● How many rows are there in each table?
● How many distinct values are there in each column?
● How is the data distributed in each column?
68
Two-types of histograms
Equi-height: One bucket represents a range of values. This type of histogram will
be created when distinct values in the column are greater than the number of
buckets specified in the analyze table syntax. Think A-G H-L M-T U-Z
Singleton: One bucket represents one single value in the column and is the most
accurate and will be created when the number of distinct values in the column is
less than or equal to the number of buckets specified in the analyze table syntax.
69
Frequency
Histogram
Each distinct value has
its own bucket.
70
101
102 104
insert into freq_histogram (x)
values
(101),(101),(102),(102),(102),(104);
Frequency Histogram - an Example
select id,x from freq_histogram ;
+----+-----+
| id | x |
+----+-----+
| 1 | 101 |
| 2 | 101 |
| 3 | 102 |
| 4 | 102 |
| 5 | 102 |
| 6 | 104 |
+----+-----+
analyze table freq_histogram
update histogram on x with 3 buckets;
71
select JSON_PRETTY(HISTOGRAM->>"$") from
information_schema.column_statistics where
table_name='freq_histogram' and column_name='x'G
*************************** 1. row
***************************
JSON_PRETTY(HISTOGRAM->>"$"): {
"buckets": [
[
101,
0.3333333333333333
],
[
102,
0.8333333333333333
],
[
104,
1.0
]
],
"data-type": "int",
"null-values": 0.0,
"collation-id": 8,
"last-updated": "2020-06-16 19:54:21.033017",
"sampling-rate": 1.0,
"histogram-type": "singleton",
"number-of-buckets-specified": 3
}
The statistics
SELECT (SUBSTRING_INDEX(v, ':', -1)) value,
concat(round(c*100,1),'%') cumulfreq,
CONCAT(round((c - LAG(c, 1, 0) over()) * 100,1), '%') freq
FROM information_schema.column_statistics,
JSON_TABLE(histogram->'$.buckets','$[*]'
COLUMNS(v VARCHAR(60) PATH '$[0]',
c double PATH '$[1]')) hist
WHERE schema_name = 'demo'
and table_name = 'freq_histogram'
and column_name = 'x';
+-------+-----------+-------+
| value | cumulfreq | freq |
+-------+-----------+-------+
| 101 | 33.3% | 33.3% |
| 102 | 83.3% | 50.0% |
| 104 | 100.0% | 16.7% |
+-------+-----------+-------+
72
Creating and removing Histograms
ANALYZE TABLE t UPDATE HISTOGRAM ON c1, c2, c3 WITH 10 BUCKETS;
ANALYZE TABLE t UPDATE HISTOGRAM ON c1, c3 WITH 10 BUCKETS;
ANALYZE TABLE t DROP HISTOGRAM ON c2;
Note the first statement creates three different histograms on c1, c2, and c3.
73
Information about Histograms
mysql> SELECT TABLE_NAME, COLUMN_NAME,
HISTOGRAM->>'$."data-type"' AS 'data-type',
JSON_LENGTH(HISTOGRAM->>'$."buckets"') AS 'bucket-count'
FROM INFORMATION_SCHEMA.COLUMN_STATISTICS;
+-----------------+-------------+-----------+--------------+
| TABLE_NAME | COLUMN_NAME | data-type | bucket-count |
+-----------------+-------------+-----------+--------------+
| country | Population | int | 226 |
| city | Population | int | 1024 |
| countrylan | Language | string | 457 |
+-----------------+-------------+-----------+--------------+
74
Where Histograms Shine
create table h1 (id int unsigned auto_increment,
x int unsigned, primary key(id));
insert into h1 (x) values (1),(1),(2),(2),(2),(3),(3),(3),(3);
select x, count(x) from h1 group by x;
+---+----------+
| x | count(x) |
+---+----------+
| 1 | 2 |
| 2 | 3 |
| 3 | 4 |
+---+----------+
3 rows in set (0.0011 sec)
75
Without Histogram
EXPLAIN SELECT * FROM h1 WHERE x > 0G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: h1
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 9
filtered: 33.32999801635742 – The optimizer estimates about 1/3 of the
data
Extra: Using where will go match the ‘X > 0’ condition
1 row in set, 1 warning (0.0007 sec)
76
The filtered column indicates an estimated percentage of table rows that will
be filtered by the table condition. The maximum value is 100, which means no
filtering of rows occurred.
Values decreasing from 100 indicate increasing amounts of filtering. rows shows
the estimated number of rows examined and rows × filtered shows the
number of rows that will be joined with the following table.
For example, if rows is 1000 and filtered is 50.00 (50%), the number of rows
to be joined with the following table is 1000 × 50% = 500.
Where Histograms Shine
ANALYZE TABLE h1 UPDATE HISTOGRAM ON x WITH 3 BUCKETSG
*************************** 1. row ***************************
Table: demox.h1
Op: histogram
Msg_type: status
Msg_text: Histogram statistics created for column 'x'.
1 row in set (0.0819 sec)
77
Where Histograms Shine
EXPLAIN SELECT * FROM h1 WHERE x > 0G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: h1
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 9
filtered: 100 – all rows!!!
Extra: Using where
1 row in set, 1 warning (0.0007 sec) 78
With EXPLAIN ANALYSE
EXPLAIN analyze SELECT * FROM h1 WHERE x > 0G
*************************** 1. row ***************************
EXPLAIN: -> Filter: (h1.x > 0) (cost=1.15 rows=9) (actual time=0.027..0.034 rows=9 loops=1)
-> Table scan on h1 (cost=1.15 rows=9) (actual time=0.025..0.030 rows=9 loops=1)
79
Performance is not just Indexes and Histograms
● There are many other ‘tweaks’ that can be made to speed things up
● Use explain to see what your query is doing?
○ File sorts, full table scans, using temporary tables, etc.
○ Does the join order look right
○ Buffers and caches big enough
○ Do you have enough memory
○ Disk and I/O speeds sufficient
80
Locking Options
MySQL added two locking options to MySQL 8.0:
● NOWAIT
○ A locking read that uses NOWAIT never waits to acquire a row lock. The query executes
immediately, failing with an error if a requested row is locked.
● SKIP LOCKED
○ A locking read that uses SKIP LOCKED never waits to acquire a row lock. The query executes
immediately, removing locked rows from the result set.
81
Locking Examples – Buying concert tickets
START TRANSACTION;
SELECT seat_no, row_no, cost
FROM seats s
JOIN seat_rows sr USING ( row_no )
WHERE seat_no IN ( 3,4 ) AND sr.row_no IN ( 5,6 )
AND booked = 'NO'
FOR UPDATE OF s SKIP LOCKED;
Let’s shop for tickets in rows 5 or 6 and seats 3 & 4 but we skip any locked
records!
82
LOCK NOWAIT
START TRANSACTION;
SELECT seat_no
FROM seats JOIN seat_rows USING ( row_no )
WHERE seat_no IN (3,4) AND seat_rows.row_no IN (12)
AND booked = 'NO'
FOR UPDATE OF seats SKIP LOCKED
FOR SHARE OF seat_rows NOWAIT;
Without NOWAIT, this query would have waited for innodb_lock_wait_timeout (default: 50)
seconds while attempting to acquire the shared lock on seat_rows. With NOWAIT an error is
immediately returned ERROR 3572 (HY000): Do not wait for lock.
83
Other Fast Ways
Resource Groups
Optimizer hints
Partitioning
Multi Value Indexes
84
Resource Groups
CREATE RESOURCE GROUP Batch
TYPE = USER
VCPU = 2-3 -- assumes a system with at least 4 CPUs
THREAD_PRIORITY = 10;
For example, to manage execution of batch jobs that
need not execute with high priority, a DBA can create
a Batch resource group, and adjust its priority up or
down depending on how busy the server is.
(Perhaps batch jobs assigned to the group should run
at lower priority during the day and at higher priority
during the night.) The DBA can also adjust the set of
CPUs available to the group. Groups can be enabled
or disabled to control whether threads are assignable
to them.
85
SET RESOURCE GROUP Batch FOR thread_id;
SET RESOURCE GROUP Batch;
INSERT /*+ RESOURCE_GROUP(Batch) */ INTO t2 VALUES(2);
Optimizer Hints
A way to control the optimizer is by using optimizer
hints, which can be specified within individual
statements. Because optimizer hints apply on a
per-statement basis, they provide finer control over
statement execution plans than can be achieved using
optimizer_switch
.
INSERT /*+ RESOURCE_GROUP(Batch) */ INTO t2 VALUES(2);
86
SELECT
/*+ JOIN_PREFIX(t2, t5@subq2, t4@subq1)
JOIN_ORDER(t4@subq1, t3)
JOIN_SUFFIX(t1) */
COUNT(*) FROM t1 JOIN t2 JOIN t3
WHERE t1.f1 IN (SELECT /*+ QB_NAME(subq1) */ f1 FROM t4)
AND t2.f1 IN (SELECT /*+ QB_NAME(subq2) */ f1 FROM t5);
Partitioning
CREATE TABLE members (
firstname VARCHAR(25) NOT NULL,
lastname VARCHAR(25) NOT NULL,
username VARCHAR(16) NOT NULL,
email VARCHAR(35),
joined DATE NOT NULL
)
PARTITION BY RANGE( YEAR(joined) ) (
PARTITION p0 VALUES LESS THAN (1960),
PARTITION p1 VALUES LESS THAN (1970),
PARTITION p2 VALUES LESS THAN (1980),
PARTITION p3 VALUES LESS THAN (1990),
PARTITION p4 VALUES LESS THAN MAXVALUE
);
87
Multi-value Indexing
A multi-valued index is a secondary index defined on a column that stores an array
of values. A “normal” index has one index record for each data record (1:1). A
multi-valued index can have multiple index records for a single data record (N:1).
Multi-valued indexes are intended for indexing JSON arrays.
CREATE TABLE customers (
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
modified DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
custinfo JSON
);
ALTER TABLE customers ADD INDEX comp(id, modified,
(CAST(custinfo->'$.zipcode' AS UNSIGNED ARRAY)) );
88
Do s or ?
89
Test!
1. Use EXPLAIN to get information on the query BEFORE making changes
2. Know the available indexes already established (no inventing Wheel 2.0)
3. Make ONE CHANGE AT A TIME
4. Compare performance to previous tests
5. Good enough performance?!?
a. Decide when diminishing returns have been reached OR
b. Performance is acceptable OR
c. Decide to come back later
6. Document!
90
New Book
● YOU NEED THESE BOOKS!!
● Author’s others work are also
outstanding and well written
● Amazing detail!!
91
Great Book
● Getting very dated
● Make sure you get the 3rd
edition
● Can be found in used book stores
92
My Book
● A guide to using JSON and MySQL
○ Programming Examples
○ Best practices
○ Twice the length of the original
○ NoSQL Document Store how-to
93
“We have saved around 40% of our
costs and are able to reinvest that
back into the business. And we are
scaling across EMEA, and that’s
basically all because of Oracle.”
—Asser Smidt
CEO and Cofounder, BotSupply
Startups get cloud credits and a 70%
discount for 2 years, global exposure via
marketing, events, digital promotion, and
media, plus access to mentorship, capital
and Oracle’s 430,000+ customers
Customers meet vetted startups in
transformative spaces that help them stay
ahead of their competition
Oracle stays at the competitive edge
of innovation with solutions that complement
its technology stack
Oracle for Startups - enroll at oracle.com/startup
A Virtuous Cycle of Innovation, Everybody Wins.
Thank you
● David.Stokes @Oracle.com
● @Stoker
● PHP Related blog https://elephantdolphin.blogspot.com/
● MySQL Latest Blogs https://planet.mysql.com/
● Where to ask questions https://forums.mysql.com/
● Slack https://mysqlcommunity.slack.com
slides -> slideshare.net/davestokes
95

More Related Content

What's hot

Mysql Fun
Mysql FunMysql Fun
Mysql Fun
SHC
 

What's hot (20)

FOSDEM 2022 MySQL Devroom: MySQL 8.0 - Logical Backups, Snapshots and Point-...
FOSDEM 2022 MySQL Devroom:  MySQL 8.0 - Logical Backups, Snapshots and Point-...FOSDEM 2022 MySQL Devroom:  MySQL 8.0 - Logical Backups, Snapshots and Point-...
FOSDEM 2022 MySQL Devroom: MySQL 8.0 - Logical Backups, Snapshots and Point-...
 
MySQL Database Service Webinar: Installing Drupal in oci with mds
MySQL Database Service Webinar: Installing Drupal in oci with mdsMySQL Database Service Webinar: Installing Drupal in oci with mds
MySQL Database Service Webinar: Installing Drupal in oci with mds
 
cPanel now supports MySQL 8.0 - My Top Seven Features
cPanel now supports MySQL 8.0 - My Top Seven FeaturescPanel now supports MySQL 8.0 - My Top Seven Features
cPanel now supports MySQL 8.0 - My Top Seven Features
 
From single MySQL instance to High Availability: the journey to MySQL InnoDB ...
From single MySQL instance to High Availability: the journey to MySQL InnoDB ...From single MySQL instance to High Availability: the journey to MySQL InnoDB ...
From single MySQL instance to High Availability: the journey to MySQL InnoDB ...
 
Oracle Developer Live: Deploying MySQL InnoDB Cluster on OCI with Terraform
Oracle Developer Live: Deploying MySQL InnoDB Cluster on OCI with TerraformOracle Developer Live: Deploying MySQL InnoDB Cluster on OCI with Terraform
Oracle Developer Live: Deploying MySQL InnoDB Cluster on OCI with Terraform
 
State of The Dolphin - May 2021
State of The Dolphin - May 2021State of The Dolphin - May 2021
State of The Dolphin - May 2021
 
MySQL 8.0 Features -- Oracle CodeOne 2019, All Things Open 2019
MySQL 8.0 Features -- Oracle CodeOne 2019, All Things Open 2019MySQL 8.0 Features -- Oracle CodeOne 2019, All Things Open 2019
MySQL 8.0 Features -- Oracle CodeOne 2019, All Things Open 2019
 
Deploying Magento on OCI with MDS
Deploying Magento on OCI with MDSDeploying Magento on OCI with MDS
Deploying Magento on OCI with MDS
 
MySQL Day Virtual: Best Practices Tips - Upgrading to MySQL 8.0
MySQL Day Virtual: Best Practices Tips - Upgrading to MySQL 8.0MySQL Day Virtual: Best Practices Tips - Upgrading to MySQL 8.0
MySQL Day Virtual: Best Practices Tips - Upgrading to MySQL 8.0
 
Confoo 2021 - MySQL Indexes & Histograms
Confoo 2021 - MySQL Indexes & HistogramsConfoo 2021 - MySQL Indexes & Histograms
Confoo 2021 - MySQL Indexes & Histograms
 
MySQL Group Replication: Handling Network Glitches - Best Practices
MySQL Group Replication: Handling Network Glitches - Best PracticesMySQL Group Replication: Handling Network Glitches - Best Practices
MySQL Group Replication: Handling Network Glitches - Best Practices
 
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
 
the State of the Dolphin - October 2020
the State of the Dolphin - October 2020the State of the Dolphin - October 2020
the State of the Dolphin - October 2020
 
MySQL Database Service Webinar: Upgrading from on-premise MySQL to MDS
MySQL Database Service Webinar: Upgrading from on-premise MySQL to MDSMySQL Database Service Webinar: Upgrading from on-premise MySQL to MDS
MySQL Database Service Webinar: Upgrading from on-premise MySQL to MDS
 
A Step by Step Introduction to the MySQL Document Store
A Step by Step Introduction to the MySQL Document StoreA Step by Step Introduction to the MySQL Document Store
A Step by Step Introduction to the MySQL Document Store
 
Oracle Database 12c - New Features for Developers and DBAs
Oracle Database 12c  - New Features for Developers and DBAsOracle Database 12c  - New Features for Developers and DBAs
Oracle Database 12c - New Features for Developers and DBAs
 
OTN TOUR 2016 - Oracle Database 12c - The Best Oracle Database 12c Tuning Fea...
OTN TOUR 2016 - Oracle Database 12c - The Best Oracle Database 12c Tuning Fea...OTN TOUR 2016 - Oracle Database 12c - The Best Oracle Database 12c Tuning Fea...
OTN TOUR 2016 - Oracle Database 12c - The Best Oracle Database 12c Tuning Fea...
 
MySQL Tech Café #8: MySQL 8.0 for Python Developers
MySQL Tech Café #8: MySQL 8.0 for Python DevelopersMySQL Tech Café #8: MySQL 8.0 for Python Developers
MySQL Tech Café #8: MySQL 8.0 for Python Developers
 
Looking Inside the MySQL 8.0 Document Store
Looking Inside the MySQL 8.0 Document StoreLooking Inside the MySQL 8.0 Document Store
Looking Inside the MySQL 8.0 Document Store
 
Mysql Fun
Mysql FunMysql Fun
Mysql Fun
 

Similar to MySQL Indexes and Histograms - RMOUG Training Days 2022

The thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designThe thinking persons guide to data warehouse design
The thinking persons guide to data warehouse design
Calpont
 

Similar to MySQL Indexes and Histograms - RMOUG Training Days 2022 (20)

Dutch PHP Conference 2021 - MySQL Indexes and Histograms
Dutch PHP Conference 2021 - MySQL Indexes and HistogramsDutch PHP Conference 2021 - MySQL Indexes and Histograms
Dutch PHP Conference 2021 - MySQL Indexes and Histograms
 
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...
 
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
 
Advanced MySQL Query Optimizations
Advanced MySQL Query OptimizationsAdvanced MySQL Query Optimizations
Advanced MySQL Query Optimizations
 
What Your Database Query is Really Doing
What Your Database Query is Really DoingWhat Your Database Query is Really Doing
What Your Database Query is Really Doing
 
MySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer GuideMySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer Guide
 
Oracle Query Optimizer - An Introduction
Oracle Query Optimizer - An IntroductionOracle Query Optimizer - An Introduction
Oracle Query Optimizer - An Introduction
 
MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015
 
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
 
Randomizing Data With SQL Server
Randomizing Data With SQL ServerRandomizing Data With SQL Server
Randomizing Data With SQL Server
 
MySQL 8.0 Featured for Developers
MySQL 8.0 Featured for DevelopersMySQL 8.0 Featured for Developers
MySQL 8.0 Featured for Developers
 
15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance
 
PHP Detroit -- MySQL 8 A New Beginning (updated presentation)
PHP Detroit -- MySQL 8 A New Beginning (updated presentation)PHP Detroit -- MySQL 8 A New Beginning (updated presentation)
PHP Detroit -- MySQL 8 A New Beginning (updated presentation)
 
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12c
 
How to analyze and tune sql queries for better performance vts2016
How to analyze and tune sql queries for better performance vts2016How to analyze and tune sql queries for better performance vts2016
How to analyze and tune sql queries for better performance vts2016
 
Oracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big DataOracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big Data
 
The thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designThe thinking persons guide to data warehouse design
The thinking persons guide to data warehouse design
 
Big Data: Getting off to a fast start with Big SQL (World of Watson 2016 sess...
Big Data: Getting off to a fast start with Big SQL (World of Watson 2016 sess...Big Data: Getting off to a fast start with Big SQL (World of Watson 2016 sess...
Big Data: Getting off to a fast start with Big SQL (World of Watson 2016 sess...
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server Databases
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
 

More from Dave Stokes

More from Dave Stokes (20)

Locking Down Your MySQL Database.pptx
Locking Down Your MySQL Database.pptxLocking Down Your MySQL Database.pptx
Locking Down Your MySQL Database.pptx
 
Linuxfest Northwest 2022 - MySQL 8.0 Nre Features
Linuxfest Northwest 2022 - MySQL 8.0 Nre FeaturesLinuxfest Northwest 2022 - MySQL 8.0 Nre Features
Linuxfest Northwest 2022 - MySQL 8.0 Nre Features
 
Windowing Functions - Little Rock Tech fest 2019
Windowing Functions - Little Rock Tech fest 2019Windowing Functions - Little Rock Tech fest 2019
Windowing Functions - Little Rock Tech fest 2019
 
Develop PHP Applications with MySQL X DevAPI
Develop PHP Applications with MySQL X DevAPIDevelop PHP Applications with MySQL X DevAPI
Develop PHP Applications with MySQL X DevAPI
 
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San FranciscoMySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
 
The Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL DatabasesThe Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL Databases
 
MySQL without the SQL -- Cascadia PHP
MySQL without the SQL -- Cascadia PHPMySQL without the SQL -- Cascadia PHP
MySQL without the SQL -- Cascadia PHP
 
MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018
 
MySQL Without The SQL -- Oh My! PHP[Tek] June 2018
MySQL Without The SQL -- Oh My! PHP[Tek] June 2018MySQL Without The SQL -- Oh My! PHP[Tek] June 2018
MySQL Without The SQL -- Oh My! PHP[Tek] June 2018
 
Presentation Skills for Open Source Folks
Presentation Skills for Open Source FolksPresentation Skills for Open Source Folks
Presentation Skills for Open Source Folks
 
MySQL Without the SQL -- Oh My! Longhorn PHP Conference
MySQL Without the SQL -- Oh My!  Longhorn PHP ConferenceMySQL Without the SQL -- Oh My!  Longhorn PHP Conference
MySQL Without the SQL -- Oh My! Longhorn PHP Conference
 
MySQL 8 -- A new beginning : Sunshine PHP/PHP UK (updated)
MySQL 8 -- A new beginning : Sunshine PHP/PHP UK (updated)MySQL 8 -- A new beginning : Sunshine PHP/PHP UK (updated)
MySQL 8 -- A new beginning : Sunshine PHP/PHP UK (updated)
 
ConFoo MySQL Replication Evolution : From Simple to Group Replication
ConFoo  MySQL Replication Evolution : From Simple to Group ReplicationConFoo  MySQL Replication Evolution : From Simple to Group Replication
ConFoo MySQL Replication Evolution : From Simple to Group Replication
 
Making MySQL Agile-ish
Making MySQL Agile-ishMaking MySQL Agile-ish
Making MySQL Agile-ish
 
PHP Database Programming Basics -- Northeast PHP
PHP Database Programming Basics -- Northeast PHPPHP Database Programming Basics -- Northeast PHP
PHP Database Programming Basics -- Northeast PHP
 
MySQL 101 PHPTek 2017
MySQL 101 PHPTek 2017MySQL 101 PHPTek 2017
MySQL 101 PHPTek 2017
 
MySQL Replication Evolution -- Confoo Montreal 2017
MySQL Replication Evolution -- Confoo Montreal 2017MySQL Replication Evolution -- Confoo Montreal 2017
MySQL Replication Evolution -- Confoo Montreal 2017
 
MySQL's JSON Data Type and Document Store
MySQL's JSON Data Type and Document StoreMySQL's JSON Data Type and Document Store
MySQL's JSON Data Type and Document Store
 
Five Database Mistakes and how to fix them -- Confoo Vancouver
Five Database Mistakes and how to fix them -- Confoo VancouverFive Database Mistakes and how to fix them -- Confoo Vancouver
Five Database Mistakes and how to fix them -- Confoo Vancouver
 
MySQL Utilities -- Cool Tools For You: PHP World Nov 16 2016
MySQL Utilities -- Cool Tools For You: PHP World Nov 16 2016MySQL Utilities -- Cool Tools For You: PHP World Nov 16 2016
MySQL Utilities -- Cool Tools For You: PHP World Nov 16 2016
 

Recently uploaded

一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
aagad
 
Article writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptxArticle writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptx
abhinandnam9997
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
lolsDocherty
 

Recently uploaded (13)

一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
 
Case study on merger of Vodafone and Idea (VI).pptx
Case study on merger of Vodafone and Idea (VI).pptxCase study on merger of Vodafone and Idea (VI).pptx
Case study on merger of Vodafone and Idea (VI).pptx
 
The Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyThe Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case Study
 
The AI Powered Organization-Intro to AI-LAN.pdf
The AI Powered Organization-Intro to AI-LAN.pdfThe AI Powered Organization-Intro to AI-LAN.pdf
The AI Powered Organization-Intro to AI-LAN.pdf
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
Pvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdf
 
Bug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideBug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's Guide
 
Article writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptxArticle writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptx
 
ER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAEER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAE
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
 
The Best AI Powered Software - Intellivid AI Studio
The Best AI Powered Software - Intellivid AI StudioThe Best AI Powered Software - Intellivid AI Studio
The Best AI Powered Software - Intellivid AI Studio
 
How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?
 

MySQL Indexes and Histograms - RMOUG Training Days 2022

  • 1. MySQL Indexes, Histograms, And other ways To Speed Up Your Queries Dave Stokes @Stoker david.stokes @Oracle.com Community Manager, North America MySQL Community Team Copyright © 2019 Oracle and/or its affiliates.
  • 2. The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation. Statements in this presentation relating to Oracle’s future plans, expectations, beliefs, intentions and prospects are “forward-looking statements” and are subject to material risks and uncertainties. A detailed discussion of these factors and other risks that affect our business is contained in Oracle’s Securities and Exchange Commission (SEC) filings, including our most recent reports on Form 10-K and Form 10-Q under the heading “Risk Factors.” These filings are available on the SEC’s website or on Oracle’s website at http://www.oracle.com/investor. All information in this presentation is current as of September 2019 and Oracle undertakes no duty to update any statement in light of new information or future events. Safe Harbor Copyright © 2019 Oracle and/or its affiliates.
  • 3. Warning! MySQL 5.6 End of Life is February of 2021!! 3 WAS
  • 4. Test Drive MySQL Database Service For Free Today Get $300 in credits and try MySQL Database Service free for 30 days. https://www.oracle.com/cloud/free/ Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted 4
  • 5. What Is This Session About? Nobody ever complains that the database is too fast! Speeding up queries is not a ‘dark art’ But understanding how to speed up queries is often treated as magic So we will be looking at the proper use of indexes, histograms, locking options, and some other ways to speed queries up. 5
  • 6. Yes, this is a very dry subject! ● How dry? ○ Very dry ● Lots of text on screen ○ Download slides and use as a reference later ■ slideshare.net/davestokes ○ Do not try to absorb all at once ■ Work on most frequent query first, then second most, … ■ And optimizations may need to change over time ● No Coverage today of ○ System configuration ■ OS ■ MySQL ○ Hardware ○ Networking/Cloud Slides are posted to https://slideshare.net/davestokes 6
  • 7. Normalize Your Data (also not covered today) ● Can not build a skyscraper on a foundation of sand ● Third normal form or better ○ Use JSON for ‘stub’ table data, avoiding repeated unneeded index/table accesses ● Think about how you will use your data ○ Do not use a fork to eat soup; How do you consume your data Badly normalized data will hurt the performance of your queries. No matter how much training you give it, a dachshund will not be faster than a thoroughbred horse! 7
  • 8. The Optimizer ● Consider the optimizer the brain and nervous system of the system. ● Query optimization is a feature of many relational database management systems. ● The query optimizer attempts to determine the most efficient way to execute a given query by considering the possible query plans. Wikipedia 8 One of the hardest problems in query optimization is to accurately estimate the costs of alternative query plans. Optimizers cost query plans using a mathematical model of query execution costs that relies heavily on estimates of the cardinality, or number of tuples, flowing through each edge in a query plan. Cardinality estimation in turn depends on estimates of the selection factor of predicates in the query. Traditionally, database systems estimate selectivities through fairly detailed statistics on the distribution of values in each column, such as histograms.
  • 9. The query optimizer evaluates the options ● The optimizer wants to get your data the cheapest way (least amount of very expensive disk reads) possible. ● Like a GPS, the cost is built on historical statistics. And these statistics can change while the optimizer is working. So like a traffic jam, washed out road, or other traffic problem, the optimizer may be making poor decisions for the present situation. ● The final determination from the optimizer is call the query plan. ● MySQL wants to optimize each query every time it sees it – there is no locking down the query plan like Oracle. {watch for optimizer hints later in this presentation} You will see how to obtain a query plan later in this presentation. 9
  • 10. 120! If your query has five joins then the optimizer may have to evaluate 120 different options 5! (5 * 4 * 3 * 2 * 1) 10
  • 11. EXPLAIN Explaining EXPLAIN requires a great deal of explanation 11
  • 12. EXPLAIN Syntax 12 Query optimization is covered in Chapter 8 of the MySQL Manual EXPLAIN reports how the server would process the statement, including information about how tables are joined and in which order.
  • 13. But now for something completely different The tools for looking at queries ● EXPLAIN ○ EXPLAIN FORMAT= ■ JSON ■ TREE ○ ANALYZE ● VISUAL EXPLAIN 13
  • 14. EXPLAIN Example 14 EXPLAIN is used to obtain a query execution plan (an explanation of how MySQL would execute a query) and should be considered an ESTIMATE as it does not run the query. QUERY QUERY PLAN DETAILS
  • 15. VISUAL EXPLAIN from MySQL Workbench 15
  • 16. EXPLAIN FORMAT=TREE EXPLAIN FORMAT=TREE SELECT * FROM City JOIN Country ON (City.Population = Country.Population); -> Inner hash join (city.Population = country.Population) (cost=100154.05 rows=100093) -> Table scan on City (cost=0.30 rows=4188) -> Hash -> Table scan on Country (cost=29.90 rows=239) 16
  • 17. EXPLAIN FORMAT=JSON EXPLAIN FORMAT=JSON SELECT * FROM City JOIN Country ON (City.Population = Country.Population)G *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "100154.05" }, "nested_loop": [ { "table": { "table_name": "Country", "access_type": "ALL", "rows_examined_per_scan": 239, "rows_produced_per_join": 239, "filtered": "100.00", "cost_info": { "read_cost": "6.00", "eval_cost": "23.90", "prefix_cost": "29.90", "data_read_per_join": "61K" }, 17 "used_columns": [ "Code", "Name", "Continent", "Region", "SurfaceArea", "IndepYear", "Population", "LifeExpectancy", "GNP", "GNPOld", "LocalName", "GovernmentForm", "HeadOfState", "Capital", "Code2" ] } }, { "table": { "table_name": "City", "access_type": "ALL", "rows_examined_per_scan": 4188, "rows_produced_per_join": 100093, "filtered": "10.00", "using_join_buffer": "hash join", "cost_info": { "read_cost": "30.95", "eval_cost": "10009.32", "prefix_cost": "100154.05", "data_read_per_join": "6M" }, "used_columns": [ "ID", "Name", "CountryCode", "District", "Population" ], "attached_condition": "(`world`.`city`.`Population` = `world`.`country`.`Population`)" } } ] } }
  • 18. EXPLAIN ANALYZE – MySQL 8.0.18 18 EXPLAIN ANALYZE SELECT * FROM City WHERE CountryCode = 'GBR'G *************************** 1. row *************************** EXPLAIN: -> Index lookup on City using CountryCode (CountryCode='GBR') (cost=80.76 rows=81) ( actual time=0.153..0.178 rows=81 loops=1) 1 row in set (0.0008 sec) MySQL 8.0.18 introduced EXPLAIN ANALYZE, which runs the query and produces EXPLAIN output along with timing and additional, iterator-based, information about how the optimizer's expectations matched the actual execution. Real numbers not an estimate!
  • 19. More on using EXPLAIN … later 19
  • 21. Indexes A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records. -- https://en.wikipedia.org/wiki/Database_index 21
  • 22. Think of an Index as a Table With Shortcuts to another table! Or a model of some of your data in it’s own table! And the more tables you have the read to get to the data the slower things run! (And the more memory is taken up) 22
  • 23. Many Types of Indexes
  • 24. 24
  • 25. 25 clustered index InnoDB table storage is organized based on the values of the primary key columns, to speed up queries and sorts involving the primary key columns. For best performance, choose the primary key columns carefully based on the most performance-critical queries. Because modifying the columns of the clustered index is an expensive operation, choose primary columns that are rarely or never updated. In the Oracle Database product, this type of table is known as an index-organized table
  • 26. Think of an index as a small, fast table pointing to a bigger table 26 Index 3 6 12 27 5001 Indexed Column column 1 column x column n 5001 3 6 27 12 Mental Image
  • 27. Clustered index -- stored by order of primary key 27 PRIMARY KEY column 1 column x column n 3 6 12 27 5001 Mental Image for innodb
  • 28. Secondary Index Indexes other than the clustered index are known as secondary indexes. In InnoDB, each record in a secondary index contains the primary key columns for the row, as well as the columns specified for the secondary index. InnoDB uses this primary key value to search for the row in the clustered index. If the primary key is long, the secondary indexes use more space, so it is advantageous to have a short primary key. 28
  • 29. 29 Creating a table with a PRIMARY KEY (index) CREATE TABLE t1 ( c1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY, c2 VARCHAR(100), c3 VARCHAR(100) ); An index is a list of KEYs. And you will hear ‘key’ and ‘index’ used interchangeably
  • 30. PRIMARY KEY This is an key for the index that uniquely defined for a row, should be immutable. InnoDB needs a PRIMARY KEY (and will make one up if you do not specify) No Null Values Monotonically increasing - use UUID_To_BIN() if you must use UUIDS, otherwise avoid them 30
  • 31. Indexing on a prefix of a column CREATE INDEX part_of_name ON customer (name(10)); Only the first 10 characters are indexed in this examples and this can save space/speed 31
  • 32. Multi-column index - last_name, first_name CREATE TABLE test ( id INT NOT NULL, last_name CHAR(30) NOT NULL, first_name CHAR(30) NOT NULL, PRIMARY KEY (id), INDEX name (last_name,first_name) ); 32 This index will work on (last_name,first_name) and (last_name) but not (first_name) {Use: left to right} Put highest cardinality field first
  • 33. Hashing values - an alternative SELECT * FROM tbl_name WHERE hash_col=MD5(CONCAT(val1,val2)) AND col1=val1 AND col2=val2; As an alternative to a composite index, you can introduce a column that is “hashed” based on information from other columns. If this column is short, reasonably unique, and indexed, it might be faster than a “wide” index on many columns. 33
  • 34. Some other types of indexes Unique Indexes - only one row per value Full-Text Indexes - search string data Covering index - includes all columns needed for a query Secondary index - (see next page) Spatial Indexes - geographical information 34
  • 35. Functional indexes work with only part of the data in a column or a derived value based on a value of the column. There are other options such as generated columns or storing a computed value in a column and then indexing those values. SQL > CREATE TABLE birthdays (id int NOT NULL primary key, name char(30) NOT NULL, birthday date NOT NULL, INDEX ((MONTH(birthday))) ); SQL> INSERT INTO birthdays VALUES (1,'Mac','2020-12-15'); SQL > EXPLAIN FORMAT=tree SELECT * FROM birthdays where MONTH(birthday) = 12 G *************************** 1. row *************************** EXPLAIN: -> Index lookup on birthdays using functional_index (month(birthday)=12) (cost=0.35 rows=1) 35
  • 36. More Functional Indexes SQL> CREATE INDEX total_idx ON product ((cost + shipping)); SQL > explain format=tree select id, catalog_nbr, cost, shipping, (cost + shipping) FROM product where (cost + shipping) < 100.0G *************************** 1. row *************************** EXPLAIN: -> Filter: ((cost + shipping) < 100.0) (cost=1.16 rows=2) -> Index range scan on product using total_idx (cost=1.16 rows=2) 36
  • 37. More Functional Indexes -- Order matters SQL> CREATE INDEX total_idx ON product ((cost + shipping)); SQL > explain format=tree select id, catalog_nbr, cost, shipping, (cost + shipping) FROM product where (cost + shipping) < 100.0G *************************** 1. row *************************** EXPLAIN: -> Filter: ((cost + shipping) < 100.0) (cost=1.16 rows=2) -> Index range scan on product using total_idx (cost=1.16 rows=2) 37 SQL > explain format=tree select id, catalog_nbr, cost, shipping, (cost + shipping) FROM product where (shipping + cost) < 100.0G *************************** 1. row *************************** EXPLAIN: -> Filter: ((product.shipping + product.cost) < 100.0) (cost=0.55 rows=3) -> Table scan on product (cost=0.55 rows=3) Using Index Not using Index!
  • 38. Multi-value indexes You can now have more index pointers than index keys! ○ Very useful for JSON arrays mysql> SELECT 3 MEMBER OF('[1, 3, 5, 7, "Moe"]'); +--------------------------------------+ | 3 MEMBER OF('[1, 3, 5, 7, "Moe"]') | +--------------------------------------+ | 1 | +--------------------------------------+ 38
  • 39. Big Rule #1 Always have a primary key on your InnoDB tables! Always! 39
  • 40. InnoDB will pick an index if you do not! And it will not be what you need. 40
  • 41. How to find InnoDB tables without indexes SELECT tables.table_schema , tables.table_name , tables.engine FROM information_schema.tables LEFT JOIN ( SELECT table_schema , table_name FROM information_schema.statistics GROUP BY table_schema, table_name, index_name HAVING SUM( case when non_unique = 0 and nullable != 'YES' then 1 else 0 end ) = count(*) ) puks ON tables.table_schema = puks.table_schema AND tables.table_name = puks.table_name WHERE puks.table_name IS null AND tables.table_type = 'BASE TABLE' AND Engine="InnoDB"; 41
  • 42. What if you need to add an index but can’t change Code? ALTER TABLE t1 ADD COLUMN k INT INVISIBLE PRIMARY KEY; This was you do not need to update queries in your code. 42
  • 43. MySQL has Two Main Types of Index Structures B-Tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions. 43 Hash are more efficient than nested loops joins, except when the probe side of the join is very small but joins can only be used to compute equijoins.
  • 44. Please Keep in mind ... If there is a choice between multiple indexes, MySQL normally uses the index that finds the smallest number of rows (the most selective index). MySQL can use indexes on columns more efficiently if they are declared as the same type and size. ● In this context, VARCHAR and CHAR are considered the same if they are declared as the same size. For example, VARCHAR(10) and CHAR(10) are the same size, but VARCHAR(10) and CHAR(15) are not. For comparisons between nonbinary string columns, both columns should use the same character set. For example, comparing a utf8 column with a latin1 column precludes use of an index. 44
  • 45. NULL A few words on something that may not be there! 45
  • 46. NULL ● NULL is used to designate a LACK of data. False 0 True 1 Don’t know NULL 46
  • 47. Indexing NULL values really drives down the performance of INDEXes - Cardinal Values and then a ‘junk drawer’ of nulls -> hard to search 47
  • 48. Sometimes you can create an index that covers just the right columns so you do not dive into the data! 48
  • 49. Report on salesperson’s activity by customer > desc orders; +--------------+-----------------+------+-----+-------------------+-------------------+ | Field | Type | Null | Key | Default | Extra | +--------------+-----------------+------+-----+-------------------+-------------------+ | id | bigint unsigned | NO | PRI | NULL | auto_increment | | customer_id | int unsigned | YES | MUL | NULL | | | total | decimal(10,2) | YES | | NULL | | | sales_id | int unsigned | YES | | NULL | | | order_placed | datetime | YES | | CURRENT_TIMESTAMP | DEFAULT_GENERATED | +--------------+-----------------+------+-----+-------------------+-------------------+ 5 rows in set (0.0029 sec) select customer_id, total, sales_id from orders where customer_id = 101 and sales_id = 7 and total > 1.0 ; 49
  • 50. Report on salesperson’s activity by customer > desc orders; +--------------+-----------------+------+-----+-------------------+-------------------+ | Field | Type | Null | Key | Default | Extra | +--------------+-----------------+------+-----+-------------------+-------------------+ | id | bigint unsigned | NO | PRI | NULL | auto_increment | | customer_id | int unsigned | YES | MUL | NULL | | | total | decimal(10,2) | YES | | NULL | | | sales_id | int unsigned | YES | | NULL | | | order_placed | datetime | YES | | CURRENT_TIMESTAMP | DEFAULT_GENERATED | +--------------+-----------------+------+-----+-------------------+-------------------+ 5 rows in set (0.0029 sec) select customer_id, total, sales_id from orders where customer_id = 101 and sales_id = 7 and total > 1.0 ; 50
  • 51. Report on salesperson’s activity by customer > desc orders; +--------------+-----------------+------+-----+-------------------+-------------------+ | Field | Type | Null | Key | Default | Extra | +--------------+-----------------+------+-----+-------------------+-------------------+ | id | bigint unsigned | NO | PRI | NULL | auto_increment | | customer_id | int unsigned | YES | MUL | NULL | | | total | decimal(10,2) | YES | | NULL | | | sales_id | int unsigned | YES | | NULL | | | order_placed | datetime | YES | | CURRENT_TIMESTAMP | DEFAULT_GENERATED | +--------------+-----------------+------+-----+-------------------+-------------------+ 5 rows in set (0.0029 sec) select customer_id, total, sales_id from orders where customer_id = 101 and sales_id = 7 and total > 1.0 ; 51
  • 52. Indexing customer_id and sales_id 52 alter table order add index c2 (customer_id, sales_id); Query cost drops from 1.5 to 0.7 by indexing two of the three columns, ~ 46% better (Note this is on a very small data set, larger data take more time)
  • 53. Indexing customer_id, sales_id, and total! 53 alter table order add index c3 (customer_id, sales_id, total); Query cost drops from 0.7 to 0.66 by indexing all three of the columns in the query, ~ 6% over previous (44% overall) This is done by an range scan on the index only and not the table itself!
  • 55. Before Invisible Indexes 1. Doubt usefulness of index 2. Check using EXPLAIN 3. Remove that Index 4. Rerun EXPLAIN 5. Get phone/text/screams from power user about slow query 6. Suddenly realize that the index in question may have had no use for you but the rest of the planet seems to need that dang index! 7. Take seconds/minutes/hours/days/weeks rebuilding that index 55
  • 56. After Invisible Indexes 1. Doubt usefulness of index 2. Check using EXPLAIN 3. Make index invisible – optimizer can not see that index! 4. Rerun EXPLAIN 5. Get phone/text/screams from power user about slow query 6. Make index visible 7. Blame problem on { network | JavaScript | GDPR | Slack | Cloud } Sys schema will show you which indexes have not been used that may be candidates for removal -- but be cautious! 56
  • 57. How to use INVISIBLE INDEX ALTER TABLE t1 ALTER INDEX i_idx INVISIBLE; ALTER TABLE t1 ALTER INDEX i_idx VISIBLE; 57
  • 58. Index Extensions (more harping on secondary indexes) InnoDB automatically extends each secondary index by appending the primary key columns to it. Consider this table definition: CREATE TABLE t1 ( i1 INT NOT NULL DEFAULT 0, i2 INT NOT NULL DEFAULT 0, d DATE DEFAULT NULL, PRIMARY KEY (i1, i2), INDEX k_d (d) ) ENGINE = InnoDB; This table defines the primary key on columns (i1, i2). It also defines a secondary index k_d on column (d), but internally InnoDB extends this index and treats it as columns (d, i1, i2). The optimizer takes into account the primary key columns of the extended secondary index when determining how and whether to use that index. This can result in more efficient query execution plans and better performance. The optimizer can use extended secondary indexes for ref, range, and index_merge index access, for Loose Index Scan access, for join and sorting optimization, and for MIN()/MAX() optimization. 58
  • 59. Remove unused indexes to save memory * 59 * Make sure you check this several hours (or days) after a reboot just in case you find a rarely used but important index!!
  • 60. Remove unused indexes to save memory * SELECT object_schema as 'schema', object_name, index_name FROM performance_schema.table_io_waits_summary_by_index_usage WHERE (object_schema != 'mysql' and object_schema != 'performance_schema') AND index_name IS NOT NULL AND count_star = 0 ORDER BY object_schema, object_name, index_name; 60
  • 61. Monitor tables with FULL SCANs -> try to reduce May be a legitimate FULL SCAN or a lack of a good index. Yes, sometimes you have to read the full table! 61
  • 62. Histograms What is a histogram? It is not a gluten free, keto friendly biscuit. 62
  • 64. Histograms What is a histogram? Wikipedia declares a histogram is an accurate representation of the distribution of numerical data. For RDBMS, a histogram is an approximation of the data distribution within a specific column. So in MySQL, histograms help the optimizer to find the most efficient Query Plan to fetch that data. 64
  • 65. Histograms What is a histogram? A histogram is a distribution of data into logical buckets. There are two types: ● Singleton ● Equi-Height Maximum number of buckets is 1024 65
  • 66. Histogram Histogram statistics are useful primarily for non-indexed columns. A histogram is created or updated only on demand, so it adds no overhead when table data is modified. On the other hand, the statistics become progressively more out of date when table modifications occur, until the next time they are updated. 66
  • 67. The two reasons for why you might consider a histogram instead of an index: Maintaining an indexes have a cost. If you have an index, every ● Every INSERT/UPDATE/DELETE causes the index to be updated. This will have an impact on your performance. A histogram on the other hand is created once and never updated unless you explicitly ask for it. It will thus not hurt your INSERT/UPDATE/DELETE-performance. ● The optimizer will make “index dives” to estimate the number of records in a given range. This might become too costly if you have for instance very long IN-lists in your query. Histogram statistics are much cheaper in this case, and might thus be more suitable. 67
  • 68. The Optimizer Occasionally the query optimizer fails to find the most efficient plan and ends up spending a lot more time executing the query than necessary. The optimizer assumes that the data is evenly distributed in the column. This can be the old ‘assume’ makes an ‘ass out of you and me’ joke brought to life. The main reason for this is often that the optimizer doesn’t have enough knowledge about the data it is about to query: ● How many rows are there in each table? ● How many distinct values are there in each column? ● How is the data distributed in each column? 68
  • 69. Two-types of histograms Equi-height: One bucket represents a range of values. This type of histogram will be created when distinct values in the column are greater than the number of buckets specified in the analyze table syntax. Think A-G H-L M-T U-Z Singleton: One bucket represents one single value in the column and is the most accurate and will be created when the number of distinct values in the column is less than or equal to the number of buckets specified in the analyze table syntax. 69
  • 70. Frequency Histogram Each distinct value has its own bucket. 70 101 102 104 insert into freq_histogram (x) values (101),(101),(102),(102),(102),(104);
  • 71. Frequency Histogram - an Example select id,x from freq_histogram ; +----+-----+ | id | x | +----+-----+ | 1 | 101 | | 2 | 101 | | 3 | 102 | | 4 | 102 | | 5 | 102 | | 6 | 104 | +----+-----+ analyze table freq_histogram update histogram on x with 3 buckets; 71 select JSON_PRETTY(HISTOGRAM->>"$") from information_schema.column_statistics where table_name='freq_histogram' and column_name='x'G *************************** 1. row *************************** JSON_PRETTY(HISTOGRAM->>"$"): { "buckets": [ [ 101, 0.3333333333333333 ], [ 102, 0.8333333333333333 ], [ 104, 1.0 ] ], "data-type": "int", "null-values": 0.0, "collation-id": 8, "last-updated": "2020-06-16 19:54:21.033017", "sampling-rate": 1.0, "histogram-type": "singleton", "number-of-buckets-specified": 3 }
  • 72. The statistics SELECT (SUBSTRING_INDEX(v, ':', -1)) value, concat(round(c*100,1),'%') cumulfreq, CONCAT(round((c - LAG(c, 1, 0) over()) * 100,1), '%') freq FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets','$[*]' COLUMNS(v VARCHAR(60) PATH '$[0]', c double PATH '$[1]')) hist WHERE schema_name = 'demo' and table_name = 'freq_histogram' and column_name = 'x'; +-------+-----------+-------+ | value | cumulfreq | freq | +-------+-----------+-------+ | 101 | 33.3% | 33.3% | | 102 | 83.3% | 50.0% | | 104 | 100.0% | 16.7% | +-------+-----------+-------+ 72
  • 73. Creating and removing Histograms ANALYZE TABLE t UPDATE HISTOGRAM ON c1, c2, c3 WITH 10 BUCKETS; ANALYZE TABLE t UPDATE HISTOGRAM ON c1, c3 WITH 10 BUCKETS; ANALYZE TABLE t DROP HISTOGRAM ON c2; Note the first statement creates three different histograms on c1, c2, and c3. 73
  • 74. Information about Histograms mysql> SELECT TABLE_NAME, COLUMN_NAME, HISTOGRAM->>'$."data-type"' AS 'data-type', JSON_LENGTH(HISTOGRAM->>'$."buckets"') AS 'bucket-count' FROM INFORMATION_SCHEMA.COLUMN_STATISTICS; +-----------------+-------------+-----------+--------------+ | TABLE_NAME | COLUMN_NAME | data-type | bucket-count | +-----------------+-------------+-----------+--------------+ | country | Population | int | 226 | | city | Population | int | 1024 | | countrylan | Language | string | 457 | +-----------------+-------------+-----------+--------------+ 74
  • 75. Where Histograms Shine create table h1 (id int unsigned auto_increment, x int unsigned, primary key(id)); insert into h1 (x) values (1),(1),(2),(2),(2),(3),(3),(3),(3); select x, count(x) from h1 group by x; +---+----------+ | x | count(x) | +---+----------+ | 1 | 2 | | 2 | 3 | | 3 | 4 | +---+----------+ 3 rows in set (0.0011 sec) 75
  • 76. Without Histogram EXPLAIN SELECT * FROM h1 WHERE x > 0G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: h1 partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 9 filtered: 33.32999801635742 – The optimizer estimates about 1/3 of the data Extra: Using where will go match the ‘X > 0’ condition 1 row in set, 1 warning (0.0007 sec) 76 The filtered column indicates an estimated percentage of table rows that will be filtered by the table condition. The maximum value is 100, which means no filtering of rows occurred. Values decreasing from 100 indicate increasing amounts of filtering. rows shows the estimated number of rows examined and rows × filtered shows the number of rows that will be joined with the following table. For example, if rows is 1000 and filtered is 50.00 (50%), the number of rows to be joined with the following table is 1000 × 50% = 500.
  • 77. Where Histograms Shine ANALYZE TABLE h1 UPDATE HISTOGRAM ON x WITH 3 BUCKETSG *************************** 1. row *************************** Table: demox.h1 Op: histogram Msg_type: status Msg_text: Histogram statistics created for column 'x'. 1 row in set (0.0819 sec) 77
  • 78. Where Histograms Shine EXPLAIN SELECT * FROM h1 WHERE x > 0G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: h1 partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 9 filtered: 100 – all rows!!! Extra: Using where 1 row in set, 1 warning (0.0007 sec) 78
  • 79. With EXPLAIN ANALYSE EXPLAIN analyze SELECT * FROM h1 WHERE x > 0G *************************** 1. row *************************** EXPLAIN: -> Filter: (h1.x > 0) (cost=1.15 rows=9) (actual time=0.027..0.034 rows=9 loops=1) -> Table scan on h1 (cost=1.15 rows=9) (actual time=0.025..0.030 rows=9 loops=1) 79
  • 80. Performance is not just Indexes and Histograms ● There are many other ‘tweaks’ that can be made to speed things up ● Use explain to see what your query is doing? ○ File sorts, full table scans, using temporary tables, etc. ○ Does the join order look right ○ Buffers and caches big enough ○ Do you have enough memory ○ Disk and I/O speeds sufficient 80
  • 81. Locking Options MySQL added two locking options to MySQL 8.0: ● NOWAIT ○ A locking read that uses NOWAIT never waits to acquire a row lock. The query executes immediately, failing with an error if a requested row is locked. ● SKIP LOCKED ○ A locking read that uses SKIP LOCKED never waits to acquire a row lock. The query executes immediately, removing locked rows from the result set. 81
  • 82. Locking Examples – Buying concert tickets START TRANSACTION; SELECT seat_no, row_no, cost FROM seats s JOIN seat_rows sr USING ( row_no ) WHERE seat_no IN ( 3,4 ) AND sr.row_no IN ( 5,6 ) AND booked = 'NO' FOR UPDATE OF s SKIP LOCKED; Let’s shop for tickets in rows 5 or 6 and seats 3 & 4 but we skip any locked records! 82
  • 83. LOCK NOWAIT START TRANSACTION; SELECT seat_no FROM seats JOIN seat_rows USING ( row_no ) WHERE seat_no IN (3,4) AND seat_rows.row_no IN (12) AND booked = 'NO' FOR UPDATE OF seats SKIP LOCKED FOR SHARE OF seat_rows NOWAIT; Without NOWAIT, this query would have waited for innodb_lock_wait_timeout (default: 50) seconds while attempting to acquire the shared lock on seat_rows. With NOWAIT an error is immediately returned ERROR 3572 (HY000): Do not wait for lock. 83
  • 84. Other Fast Ways Resource Groups Optimizer hints Partitioning Multi Value Indexes 84
  • 85. Resource Groups CREATE RESOURCE GROUP Batch TYPE = USER VCPU = 2-3 -- assumes a system with at least 4 CPUs THREAD_PRIORITY = 10; For example, to manage execution of batch jobs that need not execute with high priority, a DBA can create a Batch resource group, and adjust its priority up or down depending on how busy the server is. (Perhaps batch jobs assigned to the group should run at lower priority during the day and at higher priority during the night.) The DBA can also adjust the set of CPUs available to the group. Groups can be enabled or disabled to control whether threads are assignable to them. 85 SET RESOURCE GROUP Batch FOR thread_id; SET RESOURCE GROUP Batch; INSERT /*+ RESOURCE_GROUP(Batch) */ INTO t2 VALUES(2);
  • 86. Optimizer Hints A way to control the optimizer is by using optimizer hints, which can be specified within individual statements. Because optimizer hints apply on a per-statement basis, they provide finer control over statement execution plans than can be achieved using optimizer_switch . INSERT /*+ RESOURCE_GROUP(Batch) */ INTO t2 VALUES(2); 86 SELECT /*+ JOIN_PREFIX(t2, t5@subq2, t4@subq1) JOIN_ORDER(t4@subq1, t3) JOIN_SUFFIX(t1) */ COUNT(*) FROM t1 JOIN t2 JOIN t3 WHERE t1.f1 IN (SELECT /*+ QB_NAME(subq1) */ f1 FROM t4) AND t2.f1 IN (SELECT /*+ QB_NAME(subq2) */ f1 FROM t5);
  • 87. Partitioning CREATE TABLE members ( firstname VARCHAR(25) NOT NULL, lastname VARCHAR(25) NOT NULL, username VARCHAR(16) NOT NULL, email VARCHAR(35), joined DATE NOT NULL ) PARTITION BY RANGE( YEAR(joined) ) ( PARTITION p0 VALUES LESS THAN (1960), PARTITION p1 VALUES LESS THAN (1970), PARTITION p2 VALUES LESS THAN (1980), PARTITION p3 VALUES LESS THAN (1990), PARTITION p4 VALUES LESS THAN MAXVALUE ); 87
  • 88. Multi-value Indexing A multi-valued index is a secondary index defined on a column that stores an array of values. A “normal” index has one index record for each data record (1:1). A multi-valued index can have multiple index records for a single data record (N:1). Multi-valued indexes are intended for indexing JSON arrays. CREATE TABLE customers ( id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY, modified DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, custinfo JSON ); ALTER TABLE customers ADD INDEX comp(id, modified, (CAST(custinfo->'$.zipcode' AS UNSIGNED ARRAY)) ); 88
  • 89. Do s or ? 89
  • 90. Test! 1. Use EXPLAIN to get information on the query BEFORE making changes 2. Know the available indexes already established (no inventing Wheel 2.0) 3. Make ONE CHANGE AT A TIME 4. Compare performance to previous tests 5. Good enough performance?!? a. Decide when diminishing returns have been reached OR b. Performance is acceptable OR c. Decide to come back later 6. Document! 90
  • 91. New Book ● YOU NEED THESE BOOKS!! ● Author’s others work are also outstanding and well written ● Amazing detail!! 91
  • 92. Great Book ● Getting very dated ● Make sure you get the 3rd edition ● Can be found in used book stores 92
  • 93. My Book ● A guide to using JSON and MySQL ○ Programming Examples ○ Best practices ○ Twice the length of the original ○ NoSQL Document Store how-to 93
  • 94. “We have saved around 40% of our costs and are able to reinvest that back into the business. And we are scaling across EMEA, and that’s basically all because of Oracle.” —Asser Smidt CEO and Cofounder, BotSupply Startups get cloud credits and a 70% discount for 2 years, global exposure via marketing, events, digital promotion, and media, plus access to mentorship, capital and Oracle’s 430,000+ customers Customers meet vetted startups in transformative spaces that help them stay ahead of their competition Oracle stays at the competitive edge of innovation with solutions that complement its technology stack Oracle for Startups - enroll at oracle.com/startup A Virtuous Cycle of Innovation, Everybody Wins.
  • 95. Thank you ● David.Stokes @Oracle.com ● @Stoker ● PHP Related blog https://elephantdolphin.blogspot.com/ ● MySQL Latest Blogs https://planet.mysql.com/ ● Where to ask questions https://forums.mysql.com/ ● Slack https://mysqlcommunity.slack.com slides -> slideshare.net/davestokes 95