SlideShare a Scribd company logo
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Histogram Support in MySQL 8.0
Øystein Grøvlen
Senior Principal Software Engineer
MySQL Optimizer Team, Oracle
February 2018
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Motivating example
Quick start guide
How are histograms used?
Query example
Some advice
1
2
3
4
5
3
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Motivating example
Quick start guide
How are histograms used?
Query example
Some advice
1
2
3
4
5
4
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Motivating Example
EXPLAIN SELECT *
FROM orders JOIN customer ON o_custkey = c_custkey
WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000;
5
JOIN Query
id
select
type
table type possible keys key
key
len
ref rows filtered extra
1 SIMPLE orders ALL
i_o_orderdate,
i_o_custkey
NULL NULL NULL 15000000 31.19
Using
where
1 SIMPLE customer
eq_
ref
PRIMARY PRIMARY 4
dbt3.orders.
o_custkey
1 33.33
Using
where
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Motivating Example
EXPLAIN SELECT /*+ JOIN_ORDER(customer, orders) */ *
FROM orders JOIN customer ON o_custkey = c_custkey
WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000;
6
Reverse join order
id
select
type
table type possible keys key
key
len
ref rows filtered extra
1 SIMPLE customer ALL PRIMARY NULL NULL NULL 1500000 33.33
Using
where
1 SIMPLE orders ref
i_o_orderdate,
i_o_custkey
i_o_custkey 5
dbt3.
customer.
c_custkey
15 31.19
Using
where
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Comparing Join Order
0
2
4
6
8
10
12
14
16
QueryExecutionTime(seconds)
orders → customer customer → orders
Performance
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Histograms
ANALYZE TABLE customer UPDATE HISTOGRAM ON c_acctbal WITH 1024 BUCKETS;
EXPLAIN SELECT *
FROM orders JOIN customer ON o_custkey = c_custkey
WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000;
8
Create histogram to get a better plan
id
select
type
table type possible keys key
key
len
ref rows filtered extra
1 SIMPLE customer ALL PRIMARY NULL NULL NULL 1500000 0.00
Using
where
1 SIMPLE orders ref
i_o_orderdate,
i_o_custkey
i_o_custkey 5
dbt3.
customer.
c_custkey
15 31.19
Using
where
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Motivating example
Quick start guide
How are histograms used?
Query example
Some advice
1
2
3
4
5
9
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Histograms
• Information about value distribution for a column
• Data values group in buckets
– Frequency calculated for each bucket
– Maximum 1024 buckets
• May use sampling to build histogram
– Sample rate depends on available memory
• Automatically chooses between two histogram types:
– Singleton: One value per bucket
– Equi-height: Multiple values per bucket
10
Column statistics
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Singleton Histogram
0
0,05
0,1
0,15
0,2
0,25
0 1 2 3 5 6 7 8 9 10
Frequency
• One value per bucket
• Each bucket stores:
– Value
– Cumulative frequency
• Well suited to estimate both
equality and range predicates
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Equi-Height Histogram
0
0,05
0,1
0,15
0,2
0,25
0,3
0,35
0 - 0 1 - 1 2 - 3 5 - 6 7 - 10
Frequency
• Multiple values per bucket
• Not quite equi-height
– Values are not split across buckets
⇒Frequent values in separate buckets
• Each bucket stores:
– Minimum value
– Maximum value
– Cumulative frequency
– Number of distinct values
• Best suited for range predicates
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Usage
• Create or refresh histogram(s) for column(s):
ANALYZE TABLE table UPDATE HISTOGRAM ON column [, column] WITH n BUCKETS;
– Note: Will only update histogram, not other statistics
• Drop histogram:
ANALYZE TABLE table DROP HISTOGRAM ON column [, column];
• Based on entire table or sampling:
– Depends on avail. memory: histogram_generation_max_mem_size (default: 20 MB)
• New storage engine API for sampling
– Default implementation: Full table scan even when sampling
– Storage engines may implement more efficient sampling
13
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Storage
• Stored in a JSON column in data dictionary
• Can be inspected in Information Schema table:
SELECT JSON_PRETTY(histogram)
FROM information_schema.column_statistics
WHERE schema_name = 'dbt3_sf1'
AND table_name ='lineitem'
AND column_name = 'l_linenumber';
14
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Histogram content
{
"buckets": [[1, 0.24994938524948698], [2, 0.46421066400720523],
[3, 0.6427401784471978], [4, 0.7855470933802572],
[5, 0.8927398868395817], [6, 0.96423707532558], [7, 1] ],
"data-type": "int",
"null-values": 0.0,
"collation-id": 8,
"last-updated": "2018-02-03 21:05:21.690872",
"sampling-rate": 0.20829115437457252,
"histogram-type": "singleton",
"number-of-buckets-specified": 1024
}
15
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Strings
• Max. 42 characters considered
• Base64 encoded
SELECT FROM_BASE64(SUBSTR(v, LOCATE(':', v, 10) + 1)) value, c cumulfreq
FROM information_schema.column_statistics,
JSON_TABLE(histogram->'$.buckets', '$[*]'
COLUMNS(v VARCHAR(60) PATH '$[0]',
c double PATH '$[1]')) hist
WHERE column_name = 'o_orderstatus';
+-------+--------------------+
| value | cumulfreq |
+-------+--------------------+
| F | 0.4862529264385756 |
| O | 0.974029654577566 |
| P | 0.9999999999999999 |
+-------+--------------------+
16
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Calculate Bucket Frequency
SELECT FROM_BASE64(SUBSTR(v, LOCATE(':', v, 10) + 1)) value, c cumulfreq,
c - LAG(c, 1, 0) over () freq
FROM information_schema.column_statistics,
JSON_TABLE(histogram->'$.buckets', '$[*]'
COLUMNS(v VARCHAR(60) PATH '$[0]',
c double PATH '$[1]')) hist
WHERE column_name = 'o_orderstatus';
+-------+--------------------+----------------------+
| value | cumulfreq | freq |
+-------+--------------------+----------------------+
| F | 0.4862529264385756 | 0.4862529264385756 |
| O | 0.974029654577566 | 0.48777672813899037 |
| P | 0.9999999999999999 | 0.025970345422433927 |
+-------+--------------------+----------------------+
Use window function
17
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Motivating example
Quick start guide
How are histograms used?
Query example
Some advice
1
2
3
4
5
18
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• tx JOIN tx+1
• records(tx+1) = records(tx) * condition_filter_effect * records_per_key
When are Histograms useful?
Estimate cost of join
tx tx+1
Ref
access
Number of
records read
from tx
Conditionfilter
effect
Records passing the
table conditions on tx
Cardinality statistics
for index
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Filter estimate based on what is
available:
1. Range estimate
2. Index statistics
3. Guesstimate
= 0.1
<=,<,>,>= 1/3
BETWEEN 1/9
NOT <op> 1 – SEL(<op>)
AND P(A and B) = P(A) * P(B)
OR P(A or B) = P(A) + P(B) – P(A and B)
… …
How to Calculate Condition Filter Effect, MySQL 5.7
SELECT *
FROM office JOIN employee ON office.id = employee.office_id
WHERE office_name = 'San Francisco' AND
employee.name = 'John' AND age > 21 AND
hire_date BETWEEN '2014-01-01' AND '2014-06-01';
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Filter estimate based on what is
available:
1. Range estimate
2. Index statistics
3. Histograms
4. Guesstimate
= 0.1
<=,<,>,>= 1/3
BETWEEN 1/9
NOT <op> 1 – SEL(<op>)
AND P(A and B) = P(A) * P(B)
OR P(A or B) = P(A) + P(B) – P(A and B)
… …
How to Calculate Condition Filter Effect, MySQL 5.7
SELECT *
FROM office JOIN employee ON office.id = employee.office_id
WHERE office_name = 'San Francisco' AND
employee.name = 'John' AND age > 21 AND
hire_date BETWEEN '2014-01-01' AND '2014-06-01';
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
SELECT *
FROM office JOIN employee ON office.id = employee.office_id
WHERE office_name = 'San Francisco' AND
employee.name = 'John' AND age > 21 AND
hire_date BETWEEN '2014-01-01' AND '2014-06-01';
Calculating Condition Filter Effect for Tables
Condition filter effect for tables:
– office: 0.03
– employee: 0.29 * 0.1 * 0.33 ≈ 0.01
Example without histograms
0.1
(guesstimate)
0.33
(guesstimate)
0.29
(range)
0.03
(index)
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
SELECT *
FROM office JOIN employee ON office.id = employee.office_id
WHERE office_name = 'San Francisco' AND
employee.name = 'John' AND age > 21 AND
hire_date BETWEEN '2014-01-01' AND '2014-06-01';
Calculating Condition Filter Effect for Tables
Condition filter effect for tables:
– office: 0.03
– employee: 0.29 * 0.1 * 0.95 ≈ 0.03
Example with histogram
0.1
(guesstimate)
0.95
(histogram)
0.29
(range)
0.03
(index)
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Computing Selectivity From Histogram
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
0-7
8-16
17-24
25-31
32-38
39-46
47-53
54-61
62-70
71-104
Frequency
age
Cumulative Frequency
Example
age <= 21
0.203
Selectivity = 0.203 +
0.306
(0.306 – 0.203) * 5/8 = 0.267
age > 21 Selectivity = 1 - 0.267 = 0.733
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Motivating example
Quick start guide
How are histograms used?
Query example
Some advice
1
2
3
4
5
25
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
DBT-3 Query 7
SELECT supp_nation, cust_nation, l_year, SUM(volume) AS revenue
FROM (SELECT n1.n_name AS supp_nation, n2.n_name AS cust_nation,
EXTRACT(YEAR FROM l_shipdate) AS l_year,
l_extendedprice * (1 - l_discount) AS volume
FROM supplier, lineitem, orders, customer, nation n1, nation n2
WHERE s_suppkey = l_suppkey AND o_orderkey = l_orderkey
AND c_custkey = o_custkey AND s_nationkey = n1.n_nationkey
AND c_nationkey = n2.n_nationkey
AND ((n1.n_name = 'RUSSIA' AND n2.n_name = 'FRANCE')
OR (n1.n_name = 'FRANCE' AND n2.n_name = 'RUSSIA'))
AND l_shipdate BETWEEN '1995-01-01' AND '1996-12-31') AS shipping
GROUP BY supp_nation , cust_nation , l_year
ORDER BY supp_nation , cust_nation , l_year;
Volume Shipping Query
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
DBT-3 Query 7
Query plan without histogram
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
DBT-3 Query 7
Query plan with histogram
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
DBT-3 Query 7
0,0
0,2
0,4
0,6
0,8
1,0
1,2
1,4
1,6
1,8
QueryExecutionTime(seconds)
Without histogram With histogram
Performance
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Motivating example
Quick start guide
How is histograms used?
Query example
Some advice
1
2
3
4
5
30
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Some advice
• Histograms are useful for columns that are
– not the first column of any index, and
– used in WHERE conditions of
• JOIN queries
• Queries with IN-subqueries
• ORDER BY ... LIMIT queries
• Best fit
– Low cardinality columns (e.g., gender, orderStatus, dayOfWeek, enums)
– Columns with uneven distribution (skew)
– Stable distribution (do not change much over time)
Which columns to create histograms for?
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Some more advice
• When not to create histograms:
– First column of an index
– Never used in WHERE clause
– Monotonically increasing column values (e.g. date columns)
• Histogram will need frequent updates to be accurate
• Consider to create index
• How many buckets?
– If possible, enough to get a singleton histogram
– For equi-height, 100 buckets should be enough
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
More information
• MySQL Server Team blog
– http://mysqlserverteam.com/
– https://mysqlserverteam.com/histogram-statistics-in-mysql/ (Erik Frøseth)
• My blog:
– http://oysteing.blogspot.com/
• MySQL forums:
– Optimizer & Parser: http://forums.mysql.com/list.php?115
– Performance: http://forums.mysql.com/list.php?24
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The preceding is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
34
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 35
Histogram Support in MySQL 8.0

More Related Content

What's hot

How to analyze and tune sql queries for better performance webinar
How to analyze and tune sql queries for better performance webinarHow to analyze and tune sql queries for better performance webinar
How to analyze and tune sql queries for better performance webinar
oysteing
 
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZEMySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE
Norvald Ryeng
 
How to Take Advantage of Optimizer Improvements in MySQL 8.0
How to Take Advantage of Optimizer Improvements in MySQL 8.0How to Take Advantage of Optimizer Improvements in MySQL 8.0
How to Take Advantage of Optimizer Improvements in MySQL 8.0
Norvald Ryeng
 
SQL window functions for MySQL
SQL window functions for MySQLSQL window functions for MySQL
SQL window functions for MySQL
Dag H. Wanvik
 
LATERAL Derived Tables in MySQL 8.0
LATERAL Derived Tables in MySQL 8.0LATERAL Derived Tables in MySQL 8.0
LATERAL Derived Tables in MySQL 8.0
Norvald Ryeng
 
Query optimization techniques for partitioned tables.
Query optimization techniques for partitioned tables.Query optimization techniques for partitioned tables.
Query optimization techniques for partitioned tables.
Ashutosh Bapat
 
Partition and conquer large data in PostgreSQL 10
Partition and conquer large data in PostgreSQL 10Partition and conquer large data in PostgreSQL 10
Partition and conquer large data in PostgreSQL 10
Ashutosh Bapat
 
Agile Database Development with JSON
Agile Database Development with JSONAgile Database Development with JSON
Agile Database Development with JSON
Chris Saxon
 
New SQL features in latest MySQL releases
New SQL features in latest MySQL releasesNew SQL features in latest MySQL releases
New SQL features in latest MySQL releases
Georgi Sotirov
 
PostgreSQL: Advanced features in practice
PostgreSQL: Advanced features in practicePostgreSQL: Advanced features in practice
PostgreSQL: Advanced features in practice
Jano Suchal
 
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c OptimizerWellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Connor McDonald
 
Array i imp
Array  i impArray  i imp
Array i imp
Vivek Kumar
 
Hadoop Summit EU 2014
Hadoop Summit EU   2014Hadoop Summit EU   2014
Hadoop Summit EU 2014cwensel
 
Oracle tips and tricks
Oracle tips and tricksOracle tips and tricks
Oracle tips and tricksYanli Liu
 
Trie Data Structure
Trie Data StructureTrie Data Structure
Trie Data Structure
Badiuzzaman Pranto
 
Getting started with R when analysing GitHub commits
Getting started with R when analysing GitHub commitsGetting started with R when analysing GitHub commits
Getting started with R when analysing GitHub commits
Barbara Fusinska
 
R Programming: Export/Output Data In R
R Programming: Export/Output Data In RR Programming: Export/Output Data In R
R Programming: Export/Output Data In R
Rsquared Academy
 
R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8
Muhammad Nabi Ahmad
 

What's hot (20)

How to analyze and tune sql queries for better performance webinar
How to analyze and tune sql queries for better performance webinarHow to analyze and tune sql queries for better performance webinar
How to analyze and tune sql queries for better performance webinar
 
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZEMySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE
 
How to Take Advantage of Optimizer Improvements in MySQL 8.0
How to Take Advantage of Optimizer Improvements in MySQL 8.0How to Take Advantage of Optimizer Improvements in MySQL 8.0
How to Take Advantage of Optimizer Improvements in MySQL 8.0
 
SQL window functions for MySQL
SQL window functions for MySQLSQL window functions for MySQL
SQL window functions for MySQL
 
LATERAL Derived Tables in MySQL 8.0
LATERAL Derived Tables in MySQL 8.0LATERAL Derived Tables in MySQL 8.0
LATERAL Derived Tables in MySQL 8.0
 
Query optimization techniques for partitioned tables.
Query optimization techniques for partitioned tables.Query optimization techniques for partitioned tables.
Query optimization techniques for partitioned tables.
 
Partition and conquer large data in PostgreSQL 10
Partition and conquer large data in PostgreSQL 10Partition and conquer large data in PostgreSQL 10
Partition and conquer large data in PostgreSQL 10
 
Agile Database Development with JSON
Agile Database Development with JSONAgile Database Development with JSON
Agile Database Development with JSON
 
Api presentation
Api presentationApi presentation
Api presentation
 
New SQL features in latest MySQL releases
New SQL features in latest MySQL releasesNew SQL features in latest MySQL releases
New SQL features in latest MySQL releases
 
PostgreSQL: Advanced features in practice
PostgreSQL: Advanced features in practicePostgreSQL: Advanced features in practice
PostgreSQL: Advanced features in practice
 
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c OptimizerWellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
 
Array i imp
Array  i impArray  i imp
Array i imp
 
Hadoop Summit EU 2014
Hadoop Summit EU   2014Hadoop Summit EU   2014
Hadoop Summit EU 2014
 
Explain that explain
Explain that explainExplain that explain
Explain that explain
 
Oracle tips and tricks
Oracle tips and tricksOracle tips and tricks
Oracle tips and tricks
 
Trie Data Structure
Trie Data StructureTrie Data Structure
Trie Data Structure
 
Getting started with R when analysing GitHub commits
Getting started with R when analysing GitHub commitsGetting started with R when analysing GitHub commits
Getting started with R when analysing GitHub commits
 
R Programming: Export/Output Data In R
R Programming: Export/Output Data In RR Programming: Export/Output Data In R
R Programming: Export/Output Data In R
 
R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8
 

Similar to Histogram Support in MySQL 8.0

How to analyze and tune sql queries for better performance vts2016
How to analyze and tune sql queries for better performance vts2016How to analyze and tune sql queries for better performance vts2016
How to analyze and tune sql queries for better performance vts2016
oysteing
 
How to analyze and tune sql queries for better performance percona15
How to analyze and tune sql queries for better performance percona15How to analyze and tune sql queries for better performance percona15
How to analyze and tune sql queries for better performance percona15
oysteing
 
Mysql Performance Schema - fossasia 2016
Mysql Performance Schema - fossasia 2016Mysql Performance Schema - fossasia 2016
Mysql Performance Schema - fossasia 2016
Mayank Prasad
 
MySQL Optimizer Cost Model
MySQL Optimizer Cost ModelMySQL Optimizer Cost Model
MySQL Optimizer Cost Model
Olav Sandstå
 
MySQL Performance Schema : fossasia
MySQL Performance Schema : fossasiaMySQL Performance Schema : fossasia
MySQL Performance Schema : fossasia
Mayank Prasad
 
OpenWorld 2018 - SQL Tuning in 20 mins
OpenWorld 2018 - SQL Tuning in 20 minsOpenWorld 2018 - SQL Tuning in 20 mins
OpenWorld 2018 - SQL Tuning in 20 mins
Connor McDonald
 
Optimizer overviewoow2014
Optimizer overviewoow2014Optimizer overviewoow2014
Optimizer overviewoow2014
Mysql User Camp
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
Olav Sandstå
 
Upcoming changes in MySQL 5.7
Upcoming changes in MySQL 5.7Upcoming changes in MySQL 5.7
Upcoming changes in MySQL 5.7
Morgan Tocker
 
Best Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle OptimizerBest Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle Optimizer
Edgar Alejandro Villegas
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
oysteing
 
Top 10 tips for Oracle performance
Top 10 tips for Oracle performanceTop 10 tips for Oracle performance
Top 10 tips for Oracle performance
Guy Harrison
 
MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0
Manyi Lu
 
Sql and PL/SQL Best Practices I
Sql and PL/SQL Best Practices ISql and PL/SQL Best Practices I
Sql and PL/SQL Best Practices I
Carlos Oliveira
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query Optimization
Anju Garg
 
Oracle Data Redaction
Oracle Data RedactionOracle Data Redaction
Oracle Data Redaction
Alex Zaballa
 
MySQL 8.0 Released Update
MySQL 8.0 Released UpdateMySQL 8.0 Released Update
MySQL 8.0 Released Update
Keith Hollman
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
DataStax Academy
 
Performance schema and sys schema
Performance schema and sys schemaPerformance schema and sys schema
Performance schema and sys schema
Mark Leith
 
Query Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksQuery Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New Tricks
MYXPLAIN
 

Similar to Histogram Support in MySQL 8.0 (20)

How to analyze and tune sql queries for better performance vts2016
How to analyze and tune sql queries for better performance vts2016How to analyze and tune sql queries for better performance vts2016
How to analyze and tune sql queries for better performance vts2016
 
How to analyze and tune sql queries for better performance percona15
How to analyze and tune sql queries for better performance percona15How to analyze and tune sql queries for better performance percona15
How to analyze and tune sql queries for better performance percona15
 
Mysql Performance Schema - fossasia 2016
Mysql Performance Schema - fossasia 2016Mysql Performance Schema - fossasia 2016
Mysql Performance Schema - fossasia 2016
 
MySQL Optimizer Cost Model
MySQL Optimizer Cost ModelMySQL Optimizer Cost Model
MySQL Optimizer Cost Model
 
MySQL Performance Schema : fossasia
MySQL Performance Schema : fossasiaMySQL Performance Schema : fossasia
MySQL Performance Schema : fossasia
 
OpenWorld 2018 - SQL Tuning in 20 mins
OpenWorld 2018 - SQL Tuning in 20 minsOpenWorld 2018 - SQL Tuning in 20 mins
OpenWorld 2018 - SQL Tuning in 20 mins
 
Optimizer overviewoow2014
Optimizer overviewoow2014Optimizer overviewoow2014
Optimizer overviewoow2014
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
Upcoming changes in MySQL 5.7
Upcoming changes in MySQL 5.7Upcoming changes in MySQL 5.7
Upcoming changes in MySQL 5.7
 
Best Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle OptimizerBest Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle Optimizer
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
 
Top 10 tips for Oracle performance
Top 10 tips for Oracle performanceTop 10 tips for Oracle performance
Top 10 tips for Oracle performance
 
MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0
 
Sql and PL/SQL Best Practices I
Sql and PL/SQL Best Practices ISql and PL/SQL Best Practices I
Sql and PL/SQL Best Practices I
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query Optimization
 
Oracle Data Redaction
Oracle Data RedactionOracle Data Redaction
Oracle Data Redaction
 
MySQL 8.0 Released Update
MySQL 8.0 Released UpdateMySQL 8.0 Released Update
MySQL 8.0 Released Update
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 
Performance schema and sys schema
Performance schema and sys schemaPerformance schema and sys schema
Performance schema and sys schema
 
Query Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksQuery Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New Tricks
 

More from oysteing

POLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloudPOLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloud
oysteing
 
POLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloudPOLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloud
oysteing
 
POLARDB for MySQL - Parallel Query
POLARDB for MySQL - Parallel QueryPOLARDB for MySQL - Parallel Query
POLARDB for MySQL - Parallel Query
oysteing
 
JSON_TABLE -- The best of both worlds
JSON_TABLE -- The best of both worldsJSON_TABLE -- The best of both worlds
JSON_TABLE -- The best of both worlds
oysteing
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
oysteing
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
oysteing
 

More from oysteing (6)

POLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloudPOLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloud
 
POLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloudPOLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloud
 
POLARDB for MySQL - Parallel Query
POLARDB for MySQL - Parallel QueryPOLARDB for MySQL - Parallel Query
POLARDB for MySQL - Parallel Query
 
JSON_TABLE -- The best of both worlds
JSON_TABLE -- The best of both worldsJSON_TABLE -- The best of both worlds
JSON_TABLE -- The best of both worlds
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
 

Recently uploaded

Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
Sharepoint Designs
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
ayushiqss
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
NaapbooksPrivateLimi
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 

Recently uploaded (20)

Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 

Histogram Support in MySQL 8.0

  • 1.
  • 2. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Histogram Support in MySQL 8.0 Øystein Grøvlen Senior Principal Software Engineer MySQL Optimizer Team, Oracle February 2018
  • 3. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Program Agenda Motivating example Quick start guide How are histograms used? Query example Some advice 1 2 3 4 5 3
  • 4. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Program Agenda Motivating example Quick start guide How are histograms used? Query example Some advice 1 2 3 4 5 4
  • 5. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Motivating Example EXPLAIN SELECT * FROM orders JOIN customer ON o_custkey = c_custkey WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000; 5 JOIN Query id select type table type possible keys key key len ref rows filtered extra 1 SIMPLE orders ALL i_o_orderdate, i_o_custkey NULL NULL NULL 15000000 31.19 Using where 1 SIMPLE customer eq_ ref PRIMARY PRIMARY 4 dbt3.orders. o_custkey 1 33.33 Using where
  • 6. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Motivating Example EXPLAIN SELECT /*+ JOIN_ORDER(customer, orders) */ * FROM orders JOIN customer ON o_custkey = c_custkey WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000; 6 Reverse join order id select type table type possible keys key key len ref rows filtered extra 1 SIMPLE customer ALL PRIMARY NULL NULL NULL 1500000 33.33 Using where 1 SIMPLE orders ref i_o_orderdate, i_o_custkey i_o_custkey 5 dbt3. customer. c_custkey 15 31.19 Using where
  • 7. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Comparing Join Order 0 2 4 6 8 10 12 14 16 QueryExecutionTime(seconds) orders → customer customer → orders Performance
  • 8. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Histograms ANALYZE TABLE customer UPDATE HISTOGRAM ON c_acctbal WITH 1024 BUCKETS; EXPLAIN SELECT * FROM orders JOIN customer ON o_custkey = c_custkey WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000; 8 Create histogram to get a better plan id select type table type possible keys key key len ref rows filtered extra 1 SIMPLE customer ALL PRIMARY NULL NULL NULL 1500000 0.00 Using where 1 SIMPLE orders ref i_o_orderdate, i_o_custkey i_o_custkey 5 dbt3. customer. c_custkey 15 31.19 Using where
  • 9. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Program Agenda Motivating example Quick start guide How are histograms used? Query example Some advice 1 2 3 4 5 9
  • 10. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Histograms • Information about value distribution for a column • Data values group in buckets – Frequency calculated for each bucket – Maximum 1024 buckets • May use sampling to build histogram – Sample rate depends on available memory • Automatically chooses between two histogram types: – Singleton: One value per bucket – Equi-height: Multiple values per bucket 10 Column statistics
  • 11. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Singleton Histogram 0 0,05 0,1 0,15 0,2 0,25 0 1 2 3 5 6 7 8 9 10 Frequency • One value per bucket • Each bucket stores: – Value – Cumulative frequency • Well suited to estimate both equality and range predicates
  • 12. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Equi-Height Histogram 0 0,05 0,1 0,15 0,2 0,25 0,3 0,35 0 - 0 1 - 1 2 - 3 5 - 6 7 - 10 Frequency • Multiple values per bucket • Not quite equi-height – Values are not split across buckets ⇒Frequent values in separate buckets • Each bucket stores: – Minimum value – Maximum value – Cumulative frequency – Number of distinct values • Best suited for range predicates
  • 13. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Usage • Create or refresh histogram(s) for column(s): ANALYZE TABLE table UPDATE HISTOGRAM ON column [, column] WITH n BUCKETS; – Note: Will only update histogram, not other statistics • Drop histogram: ANALYZE TABLE table DROP HISTOGRAM ON column [, column]; • Based on entire table or sampling: – Depends on avail. memory: histogram_generation_max_mem_size (default: 20 MB) • New storage engine API for sampling – Default implementation: Full table scan even when sampling – Storage engines may implement more efficient sampling 13
  • 14. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Storage • Stored in a JSON column in data dictionary • Can be inspected in Information Schema table: SELECT JSON_PRETTY(histogram) FROM information_schema.column_statistics WHERE schema_name = 'dbt3_sf1' AND table_name ='lineitem' AND column_name = 'l_linenumber'; 14
  • 15. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Histogram content { "buckets": [[1, 0.24994938524948698], [2, 0.46421066400720523], [3, 0.6427401784471978], [4, 0.7855470933802572], [5, 0.8927398868395817], [6, 0.96423707532558], [7, 1] ], "data-type": "int", "null-values": 0.0, "collation-id": 8, "last-updated": "2018-02-03 21:05:21.690872", "sampling-rate": 0.20829115437457252, "histogram-type": "singleton", "number-of-buckets-specified": 1024 } 15
  • 16. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Strings • Max. 42 characters considered • Base64 encoded SELECT FROM_BASE64(SUBSTR(v, LOCATE(':', v, 10) + 1)) value, c cumulfreq FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets', '$[*]' COLUMNS(v VARCHAR(60) PATH '$[0]', c double PATH '$[1]')) hist WHERE column_name = 'o_orderstatus'; +-------+--------------------+ | value | cumulfreq | +-------+--------------------+ | F | 0.4862529264385756 | | O | 0.974029654577566 | | P | 0.9999999999999999 | +-------+--------------------+ 16
  • 17. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Calculate Bucket Frequency SELECT FROM_BASE64(SUBSTR(v, LOCATE(':', v, 10) + 1)) value, c cumulfreq, c - LAG(c, 1, 0) over () freq FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets', '$[*]' COLUMNS(v VARCHAR(60) PATH '$[0]', c double PATH '$[1]')) hist WHERE column_name = 'o_orderstatus'; +-------+--------------------+----------------------+ | value | cumulfreq | freq | +-------+--------------------+----------------------+ | F | 0.4862529264385756 | 0.4862529264385756 | | O | 0.974029654577566 | 0.48777672813899037 | | P | 0.9999999999999999 | 0.025970345422433927 | +-------+--------------------+----------------------+ Use window function 17
  • 18. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Program Agenda Motivating example Quick start guide How are histograms used? Query example Some advice 1 2 3 4 5 18
  • 19. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | • tx JOIN tx+1 • records(tx+1) = records(tx) * condition_filter_effect * records_per_key When are Histograms useful? Estimate cost of join tx tx+1 Ref access Number of records read from tx Conditionfilter effect Records passing the table conditions on tx Cardinality statistics for index
  • 20. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Filter estimate based on what is available: 1. Range estimate 2. Index statistics 3. Guesstimate = 0.1 <=,<,>,>= 1/3 BETWEEN 1/9 NOT <op> 1 – SEL(<op>) AND P(A and B) = P(A) * P(B) OR P(A or B) = P(A) + P(B) – P(A and B) … … How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco' AND employee.name = 'John' AND age > 21 AND hire_date BETWEEN '2014-01-01' AND '2014-06-01';
  • 21. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Filter estimate based on what is available: 1. Range estimate 2. Index statistics 3. Histograms 4. Guesstimate = 0.1 <=,<,>,>= 1/3 BETWEEN 1/9 NOT <op> 1 – SEL(<op>) AND P(A and B) = P(A) * P(B) OR P(A or B) = P(A) + P(B) – P(A and B) … … How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco' AND employee.name = 'John' AND age > 21 AND hire_date BETWEEN '2014-01-01' AND '2014-06-01';
  • 22. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco' AND employee.name = 'John' AND age > 21 AND hire_date BETWEEN '2014-01-01' AND '2014-06-01'; Calculating Condition Filter Effect for Tables Condition filter effect for tables: – office: 0.03 – employee: 0.29 * 0.1 * 0.33 ≈ 0.01 Example without histograms 0.1 (guesstimate) 0.33 (guesstimate) 0.29 (range) 0.03 (index)
  • 23. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco' AND employee.name = 'John' AND age > 21 AND hire_date BETWEEN '2014-01-01' AND '2014-06-01'; Calculating Condition Filter Effect for Tables Condition filter effect for tables: – office: 0.03 – employee: 0.29 * 0.1 * 0.95 ≈ 0.03 Example with histogram 0.1 (guesstimate) 0.95 (histogram) 0.29 (range) 0.03 (index)
  • 24. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Computing Selectivity From Histogram 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 0-7 8-16 17-24 25-31 32-38 39-46 47-53 54-61 62-70 71-104 Frequency age Cumulative Frequency Example age <= 21 0.203 Selectivity = 0.203 + 0.306 (0.306 – 0.203) * 5/8 = 0.267 age > 21 Selectivity = 1 - 0.267 = 0.733
  • 25. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Program Agenda Motivating example Quick start guide How are histograms used? Query example Some advice 1 2 3 4 5 25
  • 26. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | DBT-3 Query 7 SELECT supp_nation, cust_nation, l_year, SUM(volume) AS revenue FROM (SELECT n1.n_name AS supp_nation, n2.n_name AS cust_nation, EXTRACT(YEAR FROM l_shipdate) AS l_year, l_extendedprice * (1 - l_discount) AS volume FROM supplier, lineitem, orders, customer, nation n1, nation n2 WHERE s_suppkey = l_suppkey AND o_orderkey = l_orderkey AND c_custkey = o_custkey AND s_nationkey = n1.n_nationkey AND c_nationkey = n2.n_nationkey AND ((n1.n_name = 'RUSSIA' AND n2.n_name = 'FRANCE') OR (n1.n_name = 'FRANCE' AND n2.n_name = 'RUSSIA')) AND l_shipdate BETWEEN '1995-01-01' AND '1996-12-31') AS shipping GROUP BY supp_nation , cust_nation , l_year ORDER BY supp_nation , cust_nation , l_year; Volume Shipping Query
  • 27. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | DBT-3 Query 7 Query plan without histogram
  • 28. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | DBT-3 Query 7 Query plan with histogram
  • 29. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | DBT-3 Query 7 0,0 0,2 0,4 0,6 0,8 1,0 1,2 1,4 1,6 1,8 QueryExecutionTime(seconds) Without histogram With histogram Performance
  • 30. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Program Agenda Motivating example Quick start guide How is histograms used? Query example Some advice 1 2 3 4 5 30
  • 31. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Some advice • Histograms are useful for columns that are – not the first column of any index, and – used in WHERE conditions of • JOIN queries • Queries with IN-subqueries • ORDER BY ... LIMIT queries • Best fit – Low cardinality columns (e.g., gender, orderStatus, dayOfWeek, enums) – Columns with uneven distribution (skew) – Stable distribution (do not change much over time) Which columns to create histograms for?
  • 32. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Some more advice • When not to create histograms: – First column of an index – Never used in WHERE clause – Monotonically increasing column values (e.g. date columns) • Histogram will need frequent updates to be accurate • Consider to create index • How many buckets? – If possible, enough to get a singleton histogram – For equi-height, 100 buckets should be enough
  • 33. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | More information • MySQL Server Team blog – http://mysqlserverteam.com/ – https://mysqlserverteam.com/histogram-statistics-in-mysql/ (Erik Frøseth) • My blog: – http://oysteing.blogspot.com/ • MySQL forums: – Optimizer & Parser: http://forums.mysql.com/list.php?115 – Performance: http://forums.mysql.com/list.php?24
  • 34. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 34
  • 35. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 35