Last year this session’s speaker worked on several cases where data followed the same pattern: millions of popular products fit into a couple of categories, and the rest uses the rest. Her team had a hard time finding a solution for retrieving goods quickly. MySQL 8.0 has a feature that resolves such issues: optimizer histograms, storing statistics of an exact number of values in each data bucket. In real life, histograms don’t help with all queries accessing nonuniform data. How you write a statement, the number of rows in the table, data distribution: All of these may affect the use of histograms. This presentation shows examples demonstrating how the optimizer works in each case, describes how to create histograms, and covers differences between MySQL and Oracle implementations.
Billion Goods in Few Categories: How Histograms Save a Life?Sveta Smirnova
We store data with an intention to use it: search, retrieve, group, sort... To do it effectively the MySQL Optimizer uses index statistics when compiles the query execution plan. This approach works excellently unless your data distribution is not even.
Last year I worked on several tickets where data follow the same pattern: millions of popular products fit into a couple of categories and rest used the rest. We had a hard time to find a solution for retrieving goods fast. We offered workarounds for version 5.7. However new MariaDB and MySQL 8.0 feature: histograms, - would work better, cleaner and faster. The idea of the talk was born.
Of course, histograms are not a panacea and do not help in all situations.
I will discuss:
how index statistics physically stored by the storage engine
which data exchanged with the Optimizer
why it is not enough to make correct index choice
when histograms can help and when they cannot
differences between MySQL and MariaDB histograms
Billion Goods in Few Categories: How Histograms Save a Life?Sveta Smirnova
We store data with an intention to use it: search, retrieve, group, sort... To do it effectively, the MySQL Optimizer uses index statistics when it compiles the query execution plan. This approach works excellently unless your data distribution is not even.
Last year I worked on several support tickets where data follows the same pattern: millions of popular products fit into a couple of categories and the rest used the rest. We had a hard time finding a solution for retrieving goods fast. We offered workarounds for version 5.7. However, a new MariaDB and MySQL 8.0 feature - histograms - would work better, cleaner and faster. The idea of the talk was born.
Of course, histograms are not a panacea and do not help in all situations.
I will discuss
- how index statistics physically stored by the storage engine
- which data exchanged with the Optimizer
- why it is not enough to make correct index choice
- when histograms can help and when they cannot
- differences between MySQL and MariaDB histograms
Talk for Percona Live 2019 Austin: https://www.percona.com/live/19/sessions/billion-goods-in-few-categories-how-histograms-save-a-life
Billion Goods in Few Categories: how Histograms Save a Life?Sveta Smirnova
We store data with the intention to use it: search, retrieve, group, sort... To perform these actions effectively MySQL storage engines index data and communicate statistics with the Optimizer when it compiles a query execution plan. This approach works perfectly well unless your data distribution is not even.
Last year I worked on several tickets where data follow the same pattern: millions of popular products fit into a couple of categories and rest used the rest. We had a hard time to find a solution for retrieving goods fast. Workarounds for version 5.7 were offered. However new MySQL 8.0 feature: histograms, - would work better, cleaner and faster. This is how the idea of the talk was born.
I will discuss
- how index statistics physically stored
- which data exchanged with the Optimizer
- why it is not enough to make correct index choice
In the end, I will explain which issues resolve histograms and why using index statistics is insufficient for fast retrieving of not evenly distributed data.
https://www.percona.com/live/e18/sessions/billion-goods-in-few-categories-how-histograms-save-a-life
MySQL Performance Schema in Action: the Complete TutorialSveta Smirnova
Performance Schema is powerful diagnostic instrument for:
- Query performance
- Complicated locking issues
- Memory leaks
- Resource usage
- Problematic behavior, caused by inappropriate settings
- More
It comes with hundreds of options which allow precisely tune what to instrument. More than 100 consumers store collected data.
In this tutorial we will try all important instruments out. We will provide test environment and few typical problems which could be hardly solved without Performance Schema. You will not only learn how to collect and use this information, but have experience with it.
Made it on PerconaLive Frankfurt, 2018: https://www.percona.com/live/e18/sessions/mysql-performance-schema-in-action-the-complete-tutorial
Optimizer Histograms: When they Help and When Do Not?Sveta Smirnova
Talk for pre-Fosdem MySQL Day on February 1, 2019.
Last year I worked on several tickets where data follow the same pattern: millions of popular products fit into a couple of categories and rest used the rest. We had a hard time to find a solution for retrieving goods fast.
MySQL 8.0 has a feature which resolves such issues: optimizer histograms, storing statistics of an exact number of values in each data bucket.
However in real life histograms help not with all queries, accessing non-uniform data. How you write a query, the number of rows in the table, data distribution: all these may affect the use of histograms.
In this session I show examples, demonstrating how Optimizer uses histograms.
Performance Schema is a powerful diagnostic instrument for:
- Query performance
- Complicated locking issues
- Memory leaks
- Resource usage
- Problematic behavior, caused by inappropriate settings
- More
It comes with hundreds of options which allow precisely tuning what to instrument. More than 100 consumers store collected data.
In this tutorial, we will try all the important instruments out. We will provide a test environment and a few typical problems which could be hardly solved without Performance Schema. You will not only learn how to collect and use this information but have experience with it.
Tutorial at Percona Live Austin 2019
Modern solutions for modern database load: improvements in the latest MariaDB...Sveta Smirnova
Presented at MariaDB Server Fest 2020: https://mariadb.org/fest2020/improvements/
MariaDB is famous for working well in high-performance environments. But our view of what to call high-performance changes over time. Every year we get faster data transfer speed; more devices connected to the Internet; more users and, as a result, more data.
Challenges, which developers have to solve, are getting harder. This session shows what engineers do to keep the product up to date, focusing on MariaDB improvements that make it different from its predecessor, MySQL.
The document describes various features of MySQL Performance Schema. It discusses how Performance Schema provides visibility into SQL statements, prepared statements, stored routines and locks. It provides examples of using Performance Schema tables and views to diagnose issues such as slow queries, full table scans, and locks preventing DDL statements from completing. Hands-on exercises are suggested to practice analyzing statements, prepared statements and stored routines using Performance Schema.
Billion Goods in Few Categories: How Histograms Save a Life?Sveta Smirnova
We store data with an intention to use it: search, retrieve, group, sort... To do it effectively the MySQL Optimizer uses index statistics when compiles the query execution plan. This approach works excellently unless your data distribution is not even.
Last year I worked on several tickets where data follow the same pattern: millions of popular products fit into a couple of categories and rest used the rest. We had a hard time to find a solution for retrieving goods fast. We offered workarounds for version 5.7. However new MariaDB and MySQL 8.0 feature: histograms, - would work better, cleaner and faster. The idea of the talk was born.
Of course, histograms are not a panacea and do not help in all situations.
I will discuss:
how index statistics physically stored by the storage engine
which data exchanged with the Optimizer
why it is not enough to make correct index choice
when histograms can help and when they cannot
differences between MySQL and MariaDB histograms
Billion Goods in Few Categories: How Histograms Save a Life?Sveta Smirnova
We store data with an intention to use it: search, retrieve, group, sort... To do it effectively, the MySQL Optimizer uses index statistics when it compiles the query execution plan. This approach works excellently unless your data distribution is not even.
Last year I worked on several support tickets where data follows the same pattern: millions of popular products fit into a couple of categories and the rest used the rest. We had a hard time finding a solution for retrieving goods fast. We offered workarounds for version 5.7. However, a new MariaDB and MySQL 8.0 feature - histograms - would work better, cleaner and faster. The idea of the talk was born.
Of course, histograms are not a panacea and do not help in all situations.
I will discuss
- how index statistics physically stored by the storage engine
- which data exchanged with the Optimizer
- why it is not enough to make correct index choice
- when histograms can help and when they cannot
- differences between MySQL and MariaDB histograms
Talk for Percona Live 2019 Austin: https://www.percona.com/live/19/sessions/billion-goods-in-few-categories-how-histograms-save-a-life
Billion Goods in Few Categories: how Histograms Save a Life?Sveta Smirnova
We store data with the intention to use it: search, retrieve, group, sort... To perform these actions effectively MySQL storage engines index data and communicate statistics with the Optimizer when it compiles a query execution plan. This approach works perfectly well unless your data distribution is not even.
Last year I worked on several tickets where data follow the same pattern: millions of popular products fit into a couple of categories and rest used the rest. We had a hard time to find a solution for retrieving goods fast. Workarounds for version 5.7 were offered. However new MySQL 8.0 feature: histograms, - would work better, cleaner and faster. This is how the idea of the talk was born.
I will discuss
- how index statistics physically stored
- which data exchanged with the Optimizer
- why it is not enough to make correct index choice
In the end, I will explain which issues resolve histograms and why using index statistics is insufficient for fast retrieving of not evenly distributed data.
https://www.percona.com/live/e18/sessions/billion-goods-in-few-categories-how-histograms-save-a-life
MySQL Performance Schema in Action: the Complete TutorialSveta Smirnova
Performance Schema is powerful diagnostic instrument for:
- Query performance
- Complicated locking issues
- Memory leaks
- Resource usage
- Problematic behavior, caused by inappropriate settings
- More
It comes with hundreds of options which allow precisely tune what to instrument. More than 100 consumers store collected data.
In this tutorial we will try all important instruments out. We will provide test environment and few typical problems which could be hardly solved without Performance Schema. You will not only learn how to collect and use this information, but have experience with it.
Made it on PerconaLive Frankfurt, 2018: https://www.percona.com/live/e18/sessions/mysql-performance-schema-in-action-the-complete-tutorial
Optimizer Histograms: When they Help and When Do Not?Sveta Smirnova
Talk for pre-Fosdem MySQL Day on February 1, 2019.
Last year I worked on several tickets where data follow the same pattern: millions of popular products fit into a couple of categories and rest used the rest. We had a hard time to find a solution for retrieving goods fast.
MySQL 8.0 has a feature which resolves such issues: optimizer histograms, storing statistics of an exact number of values in each data bucket.
However in real life histograms help not with all queries, accessing non-uniform data. How you write a query, the number of rows in the table, data distribution: all these may affect the use of histograms.
In this session I show examples, demonstrating how Optimizer uses histograms.
Performance Schema is a powerful diagnostic instrument for:
- Query performance
- Complicated locking issues
- Memory leaks
- Resource usage
- Problematic behavior, caused by inappropriate settings
- More
It comes with hundreds of options which allow precisely tuning what to instrument. More than 100 consumers store collected data.
In this tutorial, we will try all the important instruments out. We will provide a test environment and a few typical problems which could be hardly solved without Performance Schema. You will not only learn how to collect and use this information but have experience with it.
Tutorial at Percona Live Austin 2019
Modern solutions for modern database load: improvements in the latest MariaDB...Sveta Smirnova
Presented at MariaDB Server Fest 2020: https://mariadb.org/fest2020/improvements/
MariaDB is famous for working well in high-performance environments. But our view of what to call high-performance changes over time. Every year we get faster data transfer speed; more devices connected to the Internet; more users and, as a result, more data.
Challenges, which developers have to solve, are getting harder. This session shows what engineers do to keep the product up to date, focusing on MariaDB improvements that make it different from its predecessor, MySQL.
The document describes various features of MySQL Performance Schema. It discusses how Performance Schema provides visibility into SQL statements, prepared statements, stored routines and locks. It provides examples of using Performance Schema tables and views to diagnose issues such as slow queries, full table scans, and locks preventing DDL statements from completing. Hands-on exercises are suggested to practice analyzing statements, prepared statements and stored routines using Performance Schema.
Introduction into MySQL Query Tuning for Dev[Op]sSveta Smirnova
Percona Live Online 2021 talk: https://www.percona.com/resources/videos/introduction-mysql-query-tuning-for-devops
In this talk I will show how to get started with MySQL Query Tuning. I will make a short introduction into physical table structure and demonstrate how it may influence query execution time.
Then we will discuss basic query tuning instruments and techniques, mainly EXPLAIN command with its latest variations. You will learn how to understand its output and how to rewrite queries or change table structure to achieve better performance.
Introduction to MySQL Query Tuning for Dev[Op]sSveta Smirnova
To get data, we query the database. MySQL does its best to return requested bytes as fast as possible. However, it needs human help to identify what is important and should be accessed in the first place.
Queries, written smartly, can significantly outperform automatically generated ones. Indexes and Optimizer statistics, not limited to the Histograms only, help to increase the speed of the query a lot.
In this session, I will demonstrate by examples of how MySQL query performance can be improved. I will focus on techniques, accessible by Developers and DevOps rather on those which are usually used by Database Administrators. In the end, I will present troubleshooting tools which will help you to identify why your queries do not perform. Then you could use the knowledge from the beginning of the session to improve them.
Demo on Performance Schema which I performed at DevOps Stage conference in Kiev on October 13, 2018. More at https://devopsstage.com/stranitsa-spikera/sveta-smirnova/
Talk at "Istanbul Tech Talks" in Istanbul, April, 17, 2018. http://www.istanbultechtalks.com/
In this talk I will show how to get started with MySQL Query Tuning. I will make short introduction into physical table structure and demonstrate how it may influence query execution time. Then we will discuss basic query tuning instruments and techniques, mainly EXPLAIN command with its latest variations. You will learn how to understand its output and how to rewrite query or change table structure to achieve better performance.
MySQL is a relational database management system. It provides tools for managing data, including creating, querying, updating and deleting data in databases. Some key features include:
- Creating, altering and dropping databases, tables, indexes, users and more.
- Inserting, selecting, updating and deleting data with SQL statements.
- Backup and restore capabilities using mysqldump to backup entire databases or tables.
- Security features including user accounts and privileges to control access.
- Performance optimization using indexes, partitioning, query tuning and more.
- Data types for different kinds of data like numbers, dates, text, JSON and more.
The document discusses new improvements to the parser and optimizer in MySQL 5.7. Key points include:
1) The parser and optimizer were refactored for improved maintainability and stability. Parsing was separated from optimization and execution.
2) The cost model was improved with better record estimation for joins, configurable cost constants, and additional explain output.
3) A new query rewrite plugin allows rewriting queries without changing application code.
This document provides information on using EXPLAIN to troubleshoot MySQL performance issues. EXPLAIN shows how MySQL executes SQL queries, including which indexes and joins it uses. The output includes information on the query type, access method, filtered rows, and extra details to help identify inefficient queries or indexes.
The document discusses understanding query execution in MySQL. It provides examples of using EXPLAIN to view and analyze the query execution plan chosen by the MySQL query optimizer. It examines the type column in particular, explaining the differences between types like ALL, index, range, and index_merge, and how indexes can affect optimization.
This document discusses new SQL syntax and query rewrite plugins in MySQL. It introduces FILTER clauses and custom optimizer hints and describes how to implement them using query rewrite plugins. Key points covered include the plugin interface, parsing and rewriting queries, managing memory, and customizing variables. The goal is to add new SQL features and control query execution through simple syntax extensions. Examples of implementing FILTER clauses and custom optimizer hints are provided to demonstrate how rewrite plugins work.
This document provides an agenda and overview for a MySQL Query Tuning 101 presentation. The summary includes:
1. The agenda covers topics like identifying slow queries, using indexes, the EXPLAIN tool, and other optimization techniques.
2. When queries run slow, the presenter will discuss using indexes to improve performance by allowing MySQL to access data more efficiently.
3. The EXPLAIN tool is covered as a way to estimate query execution and see how MySQL utilizes indexes. Different EXPLAIN output will be demonstrated using examples from an employees database.
Optimizer features in recent releases of other databasesSergey Petrunya
The document summarizes several recent optimizer features introduced in MySQL 8.0 and PostgreSQL versions:
- MySQL 8.0 introduced an iterator-based executor, hash joins, EXPLAIN ANALYZE, and optimizations for anti-joins, semi-joins, and subqueries.
- PostgreSQL improved query parallelism, added multi-column statistics, parallel index creation, and optimized non-recursive common table expressions.
- Both databases have focused on join algorithms, statistics gathering, and parallel query processing to improve performance. MySQL continues to adopt features from other databases in recent releases.
EXPLAIN ANALYZE is a new query profiling tool first released in MySQL 8.0.18. This presentation covers how this new feature works, both on the surface and on the inside, and how you can use it to better understand your queries, to improve them and make them go faster.
This presentation is for everyone who has ever had to understand why a query is executed slower than anticipated, and for everyone who wants to learn more about query plans and query execution in MySQL.
This presentation focuses on optimization of queries in MySQL from developer’s perspective. Developers should care about the performance of the application, which includes optimizing SQL queries. It shows the execution plan in MySQL and explain its different formats - tabular, TREE and JSON/visual explain plans. Optimizer features like optimizer hints and histograms as well as newer features like HASH joins, TREE explain plan and EXPLAIN ANALYZE from latest releases are covered. Some real examples of slow queries are included and their optimization explained.
New features in Performance Schema 5.7 include instrumentation for locks, memory usage, stored routines, and prepared statements. This provides concise insight into what is causing issues like long wait times, high memory usage, or inconsistent stored routine performance. Administrators can now quickly diagnose these types of issues using the additional visibility provided by the Performance Schema.
Using histograms to provide better query performance in MariaDB. Histograms capture the distribution of values in columns to help the query optimizer select better execution plans. The optimizer needs statistics on data distributions to estimate query costs accurately. Histograms are not enabled by default but can be collected using ANALYZE TABLE with the PERSISTENT option. Making histograms available improves the performance of queries that have selective filters or ordering on non-indexed columns.
The optimizer trace provides a detailed log of the actions taken by the query optimizer. It traces the major stages of query optimization including join preparation, join optimization, and join execution. During join optimization, it records steps like condition processing, determining table dependencies, estimating rows for plans, considering different execution plans, and choosing the best join order. The trace helps understand why certain query plans are chosen and catch differences in plans that may occur due to factors like database version changes.
What SQL functionality was added in the past year or so. The presentation covers default expressions, functional key parts, lateral derived tables, CHECK constraints, JSON and spatial improvements. Also some other small SQL and other improvements.
MySQL Indexing - Best practices for MySQL 5.6MYXPLAIN
This document provides an overview of MySQL indexing best practices. It discusses the types of indexes in MySQL, how indexes work, and how to optimize queries through proper index selection and configuration. The presentation emphasizes understanding how MySQL utilizes indexes to speed up queries through techniques like lookups, sorting, avoiding full table scans, and join optimizations. It also covers new capabilities in MySQL 5.6 like index condition pushdown that provide more flexible index usage.
Oracle Week 2015 presentation (Presented on November 15, 2015)
Agenda:
Aggregative and advanced grouping options
Analytic functions, ranking and pagination
Hierarchical and recursive queries
Oracle 12c new rows pattern matching feature
XML and JSON handling with SQL
Regular Expressions
SQLcl – a new replacement tool for SQL*Plus from Oracle
Improving MariaDB’s Query Optimizer with better selectivity estimatesSergey Petrunya
The document discusses improving selectivity estimates in MariaDB's query optimizer. It begins with background on selectivity estimates and how the query optimizer uses statistics like cardinalities and selectivities. It then covers computing selectivity for local and join conditions, including techniques like histograms. The document discusses different types of histograms used in various databases and ongoing work in MariaDB to improve its histograms. It concludes with discussing computing selectivity for multiple conditions.
This document provides an overview of indexing in SQL Server. It begins with the basics of index types including clustered, non-clustered, unique, included column and filtered indexes. It then discusses index design best practices for helping queries, joins and sort operations. Common index issues are also covered such as fragmentation, duplicate indexes, and having too many indexes. The document demonstrates index recommendations through examples and provides diagnostic tools for analyzing indexes.
Columnar Table Performance Enhancements Of Greenplum Database with Block Meta...Ontico
HighLoad++ 2017
Зал «Рио-де-Жанейро», 7 ноября, 13:00
Тезисы:
http://www.highload.ru/2017/abstracts/2923.html
Alibaba built up a data warehouse service named HybridDB in its public cloud service, based on the open sourced Greenplum Database. And it keeps on enhancing HybridDB's preformance. This presentation will talk about how Alibaba improves HybridDB's performance for columnar tables with data block's meta data (MIN/MAX values of block data) and sort keys (pre-defined keys that data will be sorted and stored with). Testing result shows that, block metadata can be generated on-the-fly without much overhead, but can achive better performance even than index scan. With sort keys, a constant response time can be archived for GROUP-BY and ORDER-BY queries.
Introduction into MySQL Query Tuning for Dev[Op]sSveta Smirnova
Percona Live Online 2021 talk: https://www.percona.com/resources/videos/introduction-mysql-query-tuning-for-devops
In this talk I will show how to get started with MySQL Query Tuning. I will make a short introduction into physical table structure and demonstrate how it may influence query execution time.
Then we will discuss basic query tuning instruments and techniques, mainly EXPLAIN command with its latest variations. You will learn how to understand its output and how to rewrite queries or change table structure to achieve better performance.
Introduction to MySQL Query Tuning for Dev[Op]sSveta Smirnova
To get data, we query the database. MySQL does its best to return requested bytes as fast as possible. However, it needs human help to identify what is important and should be accessed in the first place.
Queries, written smartly, can significantly outperform automatically generated ones. Indexes and Optimizer statistics, not limited to the Histograms only, help to increase the speed of the query a lot.
In this session, I will demonstrate by examples of how MySQL query performance can be improved. I will focus on techniques, accessible by Developers and DevOps rather on those which are usually used by Database Administrators. In the end, I will present troubleshooting tools which will help you to identify why your queries do not perform. Then you could use the knowledge from the beginning of the session to improve them.
Demo on Performance Schema which I performed at DevOps Stage conference in Kiev on October 13, 2018. More at https://devopsstage.com/stranitsa-spikera/sveta-smirnova/
Talk at "Istanbul Tech Talks" in Istanbul, April, 17, 2018. http://www.istanbultechtalks.com/
In this talk I will show how to get started with MySQL Query Tuning. I will make short introduction into physical table structure and demonstrate how it may influence query execution time. Then we will discuss basic query tuning instruments and techniques, mainly EXPLAIN command with its latest variations. You will learn how to understand its output and how to rewrite query or change table structure to achieve better performance.
MySQL is a relational database management system. It provides tools for managing data, including creating, querying, updating and deleting data in databases. Some key features include:
- Creating, altering and dropping databases, tables, indexes, users and more.
- Inserting, selecting, updating and deleting data with SQL statements.
- Backup and restore capabilities using mysqldump to backup entire databases or tables.
- Security features including user accounts and privileges to control access.
- Performance optimization using indexes, partitioning, query tuning and more.
- Data types for different kinds of data like numbers, dates, text, JSON and more.
The document discusses new improvements to the parser and optimizer in MySQL 5.7. Key points include:
1) The parser and optimizer were refactored for improved maintainability and stability. Parsing was separated from optimization and execution.
2) The cost model was improved with better record estimation for joins, configurable cost constants, and additional explain output.
3) A new query rewrite plugin allows rewriting queries without changing application code.
This document provides information on using EXPLAIN to troubleshoot MySQL performance issues. EXPLAIN shows how MySQL executes SQL queries, including which indexes and joins it uses. The output includes information on the query type, access method, filtered rows, and extra details to help identify inefficient queries or indexes.
The document discusses understanding query execution in MySQL. It provides examples of using EXPLAIN to view and analyze the query execution plan chosen by the MySQL query optimizer. It examines the type column in particular, explaining the differences between types like ALL, index, range, and index_merge, and how indexes can affect optimization.
This document discusses new SQL syntax and query rewrite plugins in MySQL. It introduces FILTER clauses and custom optimizer hints and describes how to implement them using query rewrite plugins. Key points covered include the plugin interface, parsing and rewriting queries, managing memory, and customizing variables. The goal is to add new SQL features and control query execution through simple syntax extensions. Examples of implementing FILTER clauses and custom optimizer hints are provided to demonstrate how rewrite plugins work.
This document provides an agenda and overview for a MySQL Query Tuning 101 presentation. The summary includes:
1. The agenda covers topics like identifying slow queries, using indexes, the EXPLAIN tool, and other optimization techniques.
2. When queries run slow, the presenter will discuss using indexes to improve performance by allowing MySQL to access data more efficiently.
3. The EXPLAIN tool is covered as a way to estimate query execution and see how MySQL utilizes indexes. Different EXPLAIN output will be demonstrated using examples from an employees database.
Optimizer features in recent releases of other databasesSergey Petrunya
The document summarizes several recent optimizer features introduced in MySQL 8.0 and PostgreSQL versions:
- MySQL 8.0 introduced an iterator-based executor, hash joins, EXPLAIN ANALYZE, and optimizations for anti-joins, semi-joins, and subqueries.
- PostgreSQL improved query parallelism, added multi-column statistics, parallel index creation, and optimized non-recursive common table expressions.
- Both databases have focused on join algorithms, statistics gathering, and parallel query processing to improve performance. MySQL continues to adopt features from other databases in recent releases.
EXPLAIN ANALYZE is a new query profiling tool first released in MySQL 8.0.18. This presentation covers how this new feature works, both on the surface and on the inside, and how you can use it to better understand your queries, to improve them and make them go faster.
This presentation is for everyone who has ever had to understand why a query is executed slower than anticipated, and for everyone who wants to learn more about query plans and query execution in MySQL.
This presentation focuses on optimization of queries in MySQL from developer’s perspective. Developers should care about the performance of the application, which includes optimizing SQL queries. It shows the execution plan in MySQL and explain its different formats - tabular, TREE and JSON/visual explain plans. Optimizer features like optimizer hints and histograms as well as newer features like HASH joins, TREE explain plan and EXPLAIN ANALYZE from latest releases are covered. Some real examples of slow queries are included and their optimization explained.
New features in Performance Schema 5.7 include instrumentation for locks, memory usage, stored routines, and prepared statements. This provides concise insight into what is causing issues like long wait times, high memory usage, or inconsistent stored routine performance. Administrators can now quickly diagnose these types of issues using the additional visibility provided by the Performance Schema.
Using histograms to provide better query performance in MariaDB. Histograms capture the distribution of values in columns to help the query optimizer select better execution plans. The optimizer needs statistics on data distributions to estimate query costs accurately. Histograms are not enabled by default but can be collected using ANALYZE TABLE with the PERSISTENT option. Making histograms available improves the performance of queries that have selective filters or ordering on non-indexed columns.
The optimizer trace provides a detailed log of the actions taken by the query optimizer. It traces the major stages of query optimization including join preparation, join optimization, and join execution. During join optimization, it records steps like condition processing, determining table dependencies, estimating rows for plans, considering different execution plans, and choosing the best join order. The trace helps understand why certain query plans are chosen and catch differences in plans that may occur due to factors like database version changes.
What SQL functionality was added in the past year or so. The presentation covers default expressions, functional key parts, lateral derived tables, CHECK constraints, JSON and spatial improvements. Also some other small SQL and other improvements.
MySQL Indexing - Best practices for MySQL 5.6MYXPLAIN
This document provides an overview of MySQL indexing best practices. It discusses the types of indexes in MySQL, how indexes work, and how to optimize queries through proper index selection and configuration. The presentation emphasizes understanding how MySQL utilizes indexes to speed up queries through techniques like lookups, sorting, avoiding full table scans, and join optimizations. It also covers new capabilities in MySQL 5.6 like index condition pushdown that provide more flexible index usage.
Oracle Week 2015 presentation (Presented on November 15, 2015)
Agenda:
Aggregative and advanced grouping options
Analytic functions, ranking and pagination
Hierarchical and recursive queries
Oracle 12c new rows pattern matching feature
XML and JSON handling with SQL
Regular Expressions
SQLcl – a new replacement tool for SQL*Plus from Oracle
Improving MariaDB’s Query Optimizer with better selectivity estimatesSergey Petrunya
The document discusses improving selectivity estimates in MariaDB's query optimizer. It begins with background on selectivity estimates and how the query optimizer uses statistics like cardinalities and selectivities. It then covers computing selectivity for local and join conditions, including techniques like histograms. The document discusses different types of histograms used in various databases and ongoing work in MariaDB to improve its histograms. It concludes with discussing computing selectivity for multiple conditions.
This document provides an overview of indexing in SQL Server. It begins with the basics of index types including clustered, non-clustered, unique, included column and filtered indexes. It then discusses index design best practices for helping queries, joins and sort operations. Common index issues are also covered such as fragmentation, duplicate indexes, and having too many indexes. The document demonstrates index recommendations through examples and provides diagnostic tools for analyzing indexes.
Columnar Table Performance Enhancements Of Greenplum Database with Block Meta...Ontico
HighLoad++ 2017
Зал «Рио-де-Жанейро», 7 ноября, 13:00
Тезисы:
http://www.highload.ru/2017/abstracts/2923.html
Alibaba built up a data warehouse service named HybridDB in its public cloud service, based on the open sourced Greenplum Database. And it keeps on enhancing HybridDB's preformance. This presentation will talk about how Alibaba improves HybridDB's performance for columnar tables with data block's meta data (MIN/MAX values of block data) and sort keys (pre-defined keys that data will be sorted and stored with). Testing result shows that, block metadata can be generated on-the-fly without much overhead, but can achive better performance even than index scan. With sort keys, a constant response time can be archived for GROUP-BY and ORDER-BY queries.
Percona Live 2016 (https://www.percona.com/live/data-performance-conference-2016/sessions/why-use-explain-formatjson). Although EXPLAIN FORMAT=JSON was first presented a long time ago, there still aren't many resources that explain how and why to use it. The most advertised feature is visual EXPLAIN in MySQL Workbench, but this format can do more than create nice pictures. It prints additional information that can't be found in good old tabular EXPLAIN, and can help to solve many tricky performance issues. In this session, I will not only describe which additional information we can get with the new syntax, but also provide examples showing how to use it to diagnose production issues.
The document provides an overview of MS SQL Server including its key features like Query Analyzer, Profiler, Service Manager, and Bulk Copy Program. It discusses instances, databases, database objects, joins, views, functions and sequences. The summary focuses on the high-level topics covered in the document.
This document provides an overview of a presentation on building better SQL Server databases. The presentation covers how SQL Server stores and retrieves data by looking under the hood at tables, data pages, and the process of requesting data. It then discusses best practices for database design such as using the right data types, avoiding page splits, and tips for writing efficient T-SQL code. The presentation aims to teach attendees how to design databases for optimal performance and scalability.
A corrected comparison between the databases by Tyler Weatherby 2017 Spring. A benchmark is done between MySQL MyISAM engine, MySQL Memory engine, and MonetDB engine on TPC-H data. In this project, we added the index/key to important tables.
This document discusses various techniques for optimizing queries in MySQL databases. It covers storage engines like InnoDB and MyISAM, indexing strategies including different index types and usage examples, using explain plans to analyze query performance, and rewriting queries to improve efficiency by leveraging indexes and removing unnecessary functions. The goal of these optimization techniques is to reduce load on database servers and improve query response times as data volumes increase.
MariaDB 10 Tutorial - 13.11.11 - Percona Live LondonIvan Zoratti
This document provides an overview and summary of MariaDB 10 features presented by Ivan Zoratti. It discusses new features in MariaDB 10 like storage engines, administration improvements, and replication capabilities. The document also summarizes optimization enhancements in MariaDB 10 like the new optimizer, improved indexing techniques, and subquery optimizations. Various agenda topics are outlined for the MariaDB 10 tutorial.
This document provides an overview of in-memory databases, summarizing different types including row stores, column stores, compressed column stores, and how specific databases like SQLite, Excel, Tableau, Qlik, MonetDB, SQL Server, Oracle, SAP Hana, MemSQL, and others approach in-memory storage. It also discusses hardware considerations like GPUs, FPGAs, and new memory technologies that could enhance in-memory database performance.
How many times did we have to spend countless hours looking for a T-SQL solution for the fancy requests of our users, to later discover our code doesn’t perform acceptably?
What can we do to improve the performance of our code?
Is there a methodology to follow in order to deliver better performance?
What are the mistakes to avoid?
Thank you for the detailed presentation on Postgres. I have learned a lot about its philosophy, features, community and development process. Do you have any recommendations on the best ways for newcomers to get involved and contribute to the project?
This document describes 10 steps to achieve a 10x performance improvement for a MySQL database for a social media website. The steps include monitoring the database, identifying slow SQL queries, analyzing problem queries, improving indexes, offloading read load to slaves, improving SQL, using optimal storage engines, and implementing caching. Key actions include installing monitoring tools, using mk-query-digest to analyze SQL, adding indexes, configuring InnoDB, converting tables to InnoDB, and caching query results and content with Memcache. The goal is to optimize the database infrastructure and queries to handle the load of a growing web 2.0 application.
This document outlines 10 steps to achieve a 10x performance improvement for a MySQL database. It begins by emphasizing the importance of monitoring the database. It then identifies problematic SQL statements and analyzes them to determine issues. Other steps include improving indexes, offloading read load to slaves, improving SQL queries, using appropriate storage engines, caching, sharding, and database management techniques. Front-end improvements are also suggested. The case study shows how these techniques helped a social media site achieve consistent load times and support greater growth.
Dr. D. Sugumar discusses using Microsoft Excel to analyze measurement data from an experiment. Key points covered include using Excel to calculate statistics like mean, median, mode, and standard deviation. Students will take measurements, input the data into Excel, and use functions and charts to analyze the results. Formatting, sorting, filtering and other Excel skills are reviewed to facilitate the data analysis task.
Hundreds of queries in the time of one - Gianmario SpacagnaSpark Summit
The document describes an Insights Engine that generates business insights for small businesses by combining hundreds of queries into a single optimized execution plan. It takes transaction and market data for businesses and calculates key performance indicators, comparing each business to similar competitors at different granularities of time and location. The engine uses composable "monoids" to allow efficient aggregation at multiple levels and a domain-specific language to define insights concisely. It ensures results are privacy-safe and relevant by filtering and ranking insights. The engine was able to run hundreds of queries for over 275,000 UK businesses in under 30 minutes on a small cluster.
The document discusses MySQL query optimization. It covers the query optimizer, principles of optimization like using EXPLAIN and profiling, indexes, JOIN optimization, and ORDER BY/GROUP BY optimization. The key points are to identify bottlenecks, use indexes on frequently filtered fields, avoid indexes on fields that change often or contain many duplicates, and consider composite indexes to cover multiple queries.
Everything You Need to Know About Oracle 12c IndexesSolarWinds
Indexes are important to consider for optimal performance in every Oracle database. However with each new release, there is an incredible amount of new features and/or changes which can impact how Oracle indexes function and are maintained. This often results in many applications running inefficiently. In this presentation, we will review current Oracle index structures/options and discuss how they work, when they should be used and how they should be maintained.
Title: Sista: Improving Cog’s JIT performance
Speaker: Clément Béra
Thu, August 21, 9:45am – 10:30am
Video Part1
https://www.youtube.com/watch?v=X4E_FoLysJg
Video Part2
https://www.youtube.com/watch?v=gZOk3qojoVE
Description
Abstract: Although recent improvements of the Cog VM performance made it one of the fastest available Smalltalk virtual machine, the overhead compared to optimized C code remains important. Efficient industrial object oriented virtual machine, such as Javascript V8's engine for Google Chrome and Oracle Java Hotspot can reach on many benchs the performance of optimized C code thanks to adaptive optimizations performed their JIT compilers. The VM becomes then cleverer, and after executing numerous times the same portion of codes, it stops the code execution, looks at what it is doing and recompiles critical portion of codes in code faster to run based on the current environment and previous executions.
Bio: Clément Béra and Eliot Miranda has been working together on Cog's JIT performance for the last year. Clément Béra is a young engineer and has been working in the Pharo team for the past two years. Eliot Miranda is a Smalltalk VM expert who, among others, has implemented Cog's JIT and the Spur Memory Manager for Cog.
Similar to A Billion Goods in a Few Categories: When Optimizer Histograms Help and When They Don’t (20)
MySQL 2024: Зачем переходить на MySQL 8, если в 5.х всё устраивает?Sveta Smirnova
25 октябрая 2023 года Oracle прекратила активную поддержку MySQL 5.7.
Это значит, что стоит присмотреться к улучшениям в версии 8:
- Новому системному словарю
- Современному SQL
- Поддержке JSON, NoSQL, MySQL Shell, и возможности работать с MySQL как с MongoDB
- Улучшениям в оптимизаторе запросов и диагностике
Мой доклад для разработчиков приложений под MySQL. Я не буду рассказывать как конфигурировать сервер и сфокусируюсь на его использовании.
Database in Kubernetes: Diagnostics and MonitoringSveta Smirnova
Kubernetes is the new cool in 2023. Many database installations are on Kubernetes now. And this creates challenges for Support engineers because traditional monitoring and diagnostic tools work differently on bare hardware and Kubernetes. In this session, I will focus on differences in methods we use to collect metrics, describe challenges that Percona Support hits when working with database installations on Kubernetes, and discuss how we resolve them. This talk will cover all database technologies we support: MySQL, MongoDB, and PostgreSQL.
Presented at Percona Live 2023
MySQL Database Monitoring: Must, Good and Nice to HaveSveta Smirnova
It is very easy to find if a database installation is having issues. You only need to enable Operating System monitoring. A disk, memory, or CPU usage change will alert you about the problems. But they would not show *why* the trouble happens. You need the help of database-specific monitoring tools.
As a Support Engineer, I am always very upset when handling complaints about the database behavior lacking specific database monitoring data because I cannot help!
There are two reasons database and system administrators do not enable necessary instrumentation. The first is a natural or expected performance impact. Second is the lack of knowledge on what needs to be on to resolve a particular issue.
In this talk, I will cover both concerns.
I will show which monitoring instruments will give information on what causes disk, memory, or CPU problems.
I will teach you how to use them.
I will uncover which performance impact these instruments have.
I will use both MySQL command-line client and open-source graphical instrument Percona Monitoring and Management (PMM) for the examples.
This document provides an overview of the MySQL Cookbook by O'Reilly. It discusses the intended audience of database administrators and developers. It also demonstrates different ways of interacting with MySQL, including through the command line interface, MySQL Shell, and X DevAPI. Examples are provided for common tasks like reading, writing, and updating data in both standard SQL and the object-oriented X DevAPI.
MySQL performance can be improved by tuning queries, server options, and hardware. Traditionally it was an area of responsibility for three different roles: Development, DBA, and System Administrators. Now DevOps handle these all. But there is a gap. Knowledge gained by MySQL DBAs after years or focusing on a single product is hard to gain when you focus on more than one. This is why I am doing this session. I will show a minimal but most effective set of options to improve MySQL performance. For illustrations, I will use real user stories gained from my Support experience and Percona Kubernetes operators for PXC and MySQL.
MySQL Test Framework для поддержки клиентов и верификации баговSveta Smirnova
Talk for TestDriven Conf: https://tdconf.ru/2022/abstracts/8763
MySQL Test Framework (MTR) — это фреймворк для регрессионных тестов MySQL. Тесты для него пишут разработчики MySQL и запускаются во время подготовки к новым релизам.
MTR можно использовать и по-другому. Я его использую, чтобы тестировать проблемы, о которых сообщают клиенты, и подтверждать сообщения об ошибках (bug reports) одновременно на нескольких версиях MySQL.
При помощи MTR можно:
* программировать сложные развёртывания;
* тестировать проблему на нескольких версиях MySQL/Percona/MariaDB-серверов при помощи одной команды;
* тестировать несколько одновременных соединений;
* проверять ошибки и возвращаемые значения;
* работать с результатами запросов, хранимыми процедурами и внешними командами.
Тест может быть запущен на любой машине с MySQL-, Percona- или MariaDB-сервером.
Я покажу, как я работаю с MySQL Test Framework, и надеюсь, что вы тоже полюбите этот инструмент.
This document provides an overview of different ways to work with MySQL using standard SQL, X DevAPI, and MySQL Shell utilities. It discusses querying, updating, and exporting/importing data using these different approaches. It also covers topics like character encoding, generating summaries, storing errors, and retrieving metadata. Examples are provided to illustrate concepts like selecting, grouping, joining, changing data, common table expressions, and more using SQL and X DevAPI. MySQL Shell utilities for exporting/importing CSV, JSON, and working with collections are also demonstrated.
Talk for the DevOps Pro Moscow 2021: https://www.devopspro.ru/Sveta-Smirnova/
Производительность MySQL можно улучшить при помощи оптимизации запросов, настроек MySQL сервера и железа. Традиционно эти задачи распределялись между тремя ролями: Разработчик, Администратор баз данных и Системный Администратор. Теперь же все эти задачи решает DevOps, что непросто для одного человека. В этом докладе я расскажу об основных оптимизациях, которые решают большинство проблем производительности MySQL. Для иллюстраций я буду использовать реальные пользовательские истории и Percona Kubernetes Operator.
This document provides an overview of optimizing MySQL performance for DevOps. It discusses hardware configuration including memory, disk, CPU and network. It covers important MySQL configuration options like InnoDB settings. It also discusses query tuning techniques like using indexes to improve query performance.
How to Avoid Pitfalls in Schema Upgrade with Percona XtraDB ClusterSveta Smirnova
Percona XtraDB Cluster (PXC) is a 100% synchronized cluster in regards to DML operations. It is ensured by the optimistic locking model and ability to rollback transaction which cannot be applied on all nodes. However, DDL operations are not transactional in MySQL. This adds complexity when you need to change the schema of the database.
Changes made by DDL may affect the results of the queries. Therefore all modifications must replicate on all nodes prior to the next data access. For operations that run momentarily, it can be easily achieved, but schema changes may take hours to apply. Therefore in addition to the safest synchronous blocking schema upgrade method: TOI, - PXC supports more relaxed, though not safe, method RSU.
RSU: Rolling Schema Upgrade is advertised to be non-blocking. But you still need to take care of updates, running while you are performing such an upgrade. Surprisingly, even updates on not related tables and schema can cause RSU operation to fail.
In this talk, I will uncover nuances of PXC schema upgrades and point to details you need to take special care about.
Further Information
Schema change is a frequent task, and many do not expect any surprises with it. However, the necessity to replay the changes to all synchronized nodes adds complexity. I made a webinar on a similar topic which was recorded and available for replay. Now I have found that I share a link to the webinar to my Support customers approximately once per week. Not having a good understanding of how schema change works in the cluster leads to lockups and operation failures. This talk will provide a checklist that will help to choose the best schema change method.
Presented at Percona Live Online: https://perconaliveonline2020.sched.com/event/ePm2/how-to-avoid-pitfalls-in-schema-upgrade-with-percona-xtradb-cluster
How to migrate from MySQL to MariaDB without tearsSveta Smirnova
Presented at MariaDB Server Fest 2020: https://mariadb.org/fest2020/migrate-mysql/
MariaDB is a drop-in replacement for MySQL. Initial migration is simple: start MariaDB over the old MySQL datadir.
Later your application may notice that some features work differently than with MySQL. These are MariaDB improvements, so this is good and, likely the reason you migrated.
In this session, I will focus on the differences affecting application performance and behavior. In particular, features sharing the same name, but working differently.
How Safe is Asynchronous Master-Master Setup?Sveta Smirnova
Presented at Percona MySQL Tech Day on September 10, 2020: https://www.percona.com/tech-days#mysql
It is common knowledge that built-in asynchronous active-active replication is not safe. I remember times when the official MySQL User Reference Manual stated that such an installation is not recommended for production use. Some experts repeat this claim even now.
While this statement is generally true, I worked with thousands of shops that successfully avoided asynchronous replication limitations in active-active setups.
In this talk, I will show how they did it, demonstrate situations when asynchronous source-source replication is the best possible high availability option and beats such solutions as Galera or InnoDB Clusters. I will also cover common mistakes, leading to disasters.
Современному хайлоду - современные решения: MySQL 8.0 и улучшения PerconaSveta Smirnova
MySQL всегда использовали под высокой нагрузкой. Недаром эта база была и остаётся самым популярным бэкэндом для web. Однако наши представления о хайлоде с каждым годом расширяются. Большая скорость передачи данных -> больше устройств с подключением к интернет -> больше пользователей -> больше данных.
Задачи, стоящие перед разработчиками MySQL, с каждым годом усложняются.
В этом докладе я расскажу как менялись сценарии использования MySQL за [почти] 25 лет её истории и что делали инженеры, чтобы MySQL оставалась актуальной. Мы затронем такие темы, как работа с большим количеством активных соединений и высокими объёмами данных. Я покажу насколько современные версии лучше справляются с возросшими нагрузками.
Я надеюсь, что после моего доклада те слушатели, которые используют старые версии, захотят обновиться и те, кто уже обновились, узнают как использовать современный MySQL на полную мощность.
Прочитана на конференции OST 2020: https://ostconf.com/materials/2857#2857
How to Avoid Pitfalls in Schema Upgrade with GaleraSveta Smirnova
This document discusses different methods for performing schema upgrades in a Galera cluster, including Total Order Isolation (TOI), Rolling Schema Upgrade (RSU), and the pt-online-schema-change tool. TOI blocks the entire cluster during DDL but ensures consistency, while RSU allows upgrades without blocking the cluster but requires stopping writes and carries risks of inconsistency. Pt-online-schema-change performs non-blocking upgrades using triggers to copy rows to a new table in chunks.
How Safe is Asynchronous Master-Master Setup?Sveta Smirnova
This document discusses the risks of using asynchronous master-master replication for MySQL databases and provides strategies for setting it up safely. It explains that having two nodes actively accepting writes can lead to conflicts like duplicate key errors. It recommends dividing writes across nodes by database, table, or row to avoid conflicts. The document also discusses using synchronous replication tools like Galera to ensure consistency across nodes at the cost of reduced performance.
Что нужно знать о трёх топовых фичах MySQLSveta Smirnova
MySQL прочно удерживает второе по популярности место после Oracle в рейтинге DB-engines: https://db-engines.com/en/ranking_trend Репликация, табличные движки и поддержка NoSQL не дают MySQL сдавать позиции с 2012 года: года основания рейтинга. Что особенного в этих фичах? Что нужно знать, чтобы использовать их на полную мощность?
Я расскажу про дизайн. Именно он отвечает за то, чтобы ваше приложение не достигло потолка производительности. Понимание архитектуры поможет при проектирование нового приложения, которое впоследствии будет легко масштабироваться.
Доклад рассчитан для начинающих пользователей MySQL. Однако поможет освежить свои знания и более опытным.
My talk for "MySQL, MariaDB and Friends" devroom at Fosdem on February 2, 2019
Born in 2010 in MySQL 5.5.3 as "a feature for monitoring server execution at a low level," grown in 5.6 times with performance fixes and DBA-faced features, in MySQL 5.7 Performance Schema is a mature tool, used by humans and more and more monitoring products. It becomes more popular over the years. In this talk I will give an overview of Performance Schema, focusing on its tuning, performance, and usability.
Performance Schema helps to troubleshoot query performance, complicated locking issues, memory leaks, resource usage, problematic behavior, caused by inappropriate settings and much more. It comes with hundreds of options which allow precisely tune what to instrument. More than 100 consumers store collected data.
Performance Schema is a potent tool. And very complicated at the same time. It does not affect performance in most cases and can slow down server dramatically if configured without care. It collects a lot of data, and sometimes this data is hard to read.
This talk will start from the introduction of how Performance Schema designed, and you will understand why it slowdowns server in some cases and does not affect your queries in others. Then we will discuss which information you can retrieve from Performance Schema and how to do it effectively.
I will cover its companion sys schema and graphical monitoring tools.
What is Augmented Reality Image Trackingpavan998932
Augmented Reality (AR) Image Tracking is a technology that enables AR applications to recognize and track images in the real world, overlaying digital content onto them. This enhances the user's interaction with their environment by providing additional information and interactive elements directly tied to physical images.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
8 Best Automated Android App Testing Tool and Framework in 2024.pdfkalichargn70th171
Regarding mobile operating systems, two major players dominate our thoughts: Android and iPhone. With Android leading the market, software development companies are focused on delivering apps compatible with this OS. Ensuring an app's functionality across various Android devices, OS versions, and hardware specifications is critical, making Android app testing essential.
Takashi Kobayashi and Hironori Washizaki, "SWEBOK Guide and Future of SE Education," First International Symposium on the Future of Software Engineering (FUSE), June 3-6, 2024, Okinawa, Japan
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
Zoom is a comprehensive platform designed to connect individuals and teams efficiently. With its user-friendly interface and powerful features, Zoom has become a go-to solution for virtual communication and collaboration. It offers a range of tools, including virtual meetings, team chat, VoIP phone systems, online whiteboards, and AI companions, to streamline workflows and enhance productivity.
Odoo ERP software
Odoo ERP software, a leading open-source software for Enterprise Resource Planning (ERP) and business management, has recently launched its latest version, Odoo 17 Community Edition. This update introduces a range of new features and enhancements designed to streamline business operations and support growth.
The Odoo Community serves as a cost-free edition within the Odoo suite of ERP systems. Tailored to accommodate the standard needs of business operations, it provides a robust platform suitable for organisations of different sizes and business sectors. Within the Odoo Community Edition, users can access a variety of essential features and services essential for managing day-to-day tasks efficiently.
This blog presents a detailed overview of the features available within the Odoo 17 Community edition, and the differences between Odoo 17 community and enterprise editions, aiming to equip you with the necessary information to make an informed decision about its suitability for your business.
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
Transform Your Communication with Cloud-Based IVR SolutionsTheSMSPoint
Discover the power of Cloud-Based IVR Solutions to streamline communication processes. Embrace scalability and cost-efficiency while enhancing customer experiences with features like automated call routing and voice recognition. Accessible from anywhere, these solutions integrate seamlessly with existing systems, providing real-time analytics for continuous improvement. Revolutionize your communication strategy today with Cloud-Based IVR Solutions. Learn more at: https://thesmspoint.com/channel/cloud-telephony
E-commerce Application Development Company.pdfHornet Dynamics
Your business can reach new heights with our assistance as we design solutions that are specifically appropriate for your goals and vision. Our eCommerce application solutions can digitally coordinate all retail operations processes to meet the demands of the marketplace while maintaining business continuity.
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
SOCRadar's Aviation Industry Q1 Incident Report is out now!
The aviation industry has always been a prime target for cybercriminals due to its critical infrastructure and high stakes. In the first quarter of 2024, the sector faced an alarming surge in cybersecurity threats, revealing its vulnerabilities and the relentless sophistication of cyber attackers.
SOCRadar’s Aviation Industry, Quarterly Incident Report, provides an in-depth analysis of these threats, detected and examined through our extensive monitoring of hacker forums, Telegram channels, and dark web platforms.
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Crescat
Crescat is industry-trusted event management software, built by event professionals for event professionals. Founded in 2017, we have three key products tailored for the live event industry.
Crescat Event for concert promoters and event agencies. Crescat Venue for music venues, conference centers, wedding venues, concert halls and more. And Crescat Festival for festivals, conferences and complex events.
With a wide range of popular features such as event scheduling, shift management, volunteer and crew coordination, artist booking and much more, Crescat is designed for customisation and ease-of-use.
Over 125,000 events have been planned in Crescat and with hundreds of customers of all shapes and sizes, from boutique event agencies through to international concert promoters, Crescat is rigged for success. What's more, we highly value feedback from our users and we are constantly improving our software with updates, new features and improvements.
If you plan events, run a venue or produce festivals and you're looking for ways to make your life easier, then we have a solution for you. Try our software for free or schedule a no-obligation demo with one of our product specialists today at crescat.io
Fundamentals of Programming and Language Processors
A Billion Goods in a Few Categories: When Optimizer Histograms Help and When They Don’t
1. A Billion Goods in a Few Categories
When Optimizer Histograms Help and When They Don’t
September 18, 2019
Sveta Smirnova
2. •Introduction
•The Use Case
The Cardinality: Two Levels
Example
•Why the Difference?
•Even Worse Use Case
ANALYZE TABLE Limitations
Example
•How Histograms Work?
•Left Overs
Table of Contents
2
3. The column statistics data dictionary table stores histogram statistics about
column values, for use by the optimizer in constructing query execution plans
MySQL User Reference Manual
Optimizer Statistics aka Histograms
3
4. • MySQL Support engineer
• Author of
• MySQL Troubleshooting
• JSON UDF functions
• FILTER clause for MySQL
• Speaker
• Percona Live, OOW, Fosdem,
DevConf, HighLoad...
Sveta Smirnova
4
6. • Hardware
• Wise options
• Optimized queries
• Brain
Everything can Be Resolved!
6
7. • This talk is about
•
How I spent the last three years
• Resolving the same issue
• For different customers
Not Everything
7
8. • This talk is about
•
How I spent the last three years
• Resolving the same issue
• For different customers
•
Task was to speed up the query
Not Everything
7
9. • Specific data distribution
Not All the Queries Can be Optimized
8
10. • Specific data distribution
• Access on different fields
•
ON goods.shop id = shop.id
• WHERE shop.location IN (...)
• GROUP BY goods.category, shop.profile
• ORDER BY shop.distance, goods.quantity
Not All the Queries Can be Optimized
8
11. • Specific data distribution
• Access on different fields
•
ON goods.shop id = shop.id
• WHERE shop.location IN (...)
• GROUP BY goods.category, shop.profile
• ORDER BY shop.distance, goods.quantity
• Index cannot be used effectively
Not All the Queries Can be Optimized
8
12. • Data distribution varies
•
Big difference between number of values
Red 1,000,000
Green 2
Blue 100,000
Latest Support Tickets
9
13. • Data distribution varies
•
Constantly changing
Red 100,000
Green 1,000,000
Blue 10
Latest Support Tickets
9
14. • Data distribution varies
•
Constantly changing
Red 1,000
Green 2,000
Blue 50,000
Latest Support Tickets
9
15. • Data distribution varies
• Cardinality is not correct
• Was not updated in time
•
Updates too often
• Calculated wrongly
Latest Support Tickets
9
16. • Data distribution varies
• Cardinality is not correct
• Index maintenance is expensive
• Hardware resources
•
Slow updates
• Window to run CREATE INDEX
Latest Support Tickets
9
17. • Data distribution varies
• Cardinality is not correct
• Index maintenance is expensive
•
Optimizer does not work as we wish it
Examples in my talk @Percona Live Frankfurt
Latest Support Tickets
9
18. • Topic based on real Support cases
•
Couple of them are still in progress
Disclaimer
10
19. • Topic based on real Support cases
• All examples are 100% fake
•
They are created so that
• No customer can be identified
• Everything generated
Table names
Column names
Data
• Use case itself is fictional
Disclaimer
10
20. • Topic based on real Support cases
• All examples are 100% fake
• All examples are simplified
• Only columns, required to show the issue
•
Everything extra removed
• Real tables usually store much more data
Disclaimer
10
21. • Topic based on real Support cases
• All examples are 100% fake
• All examples are simplified
• All disasters happened with version 5.7
Disclaimer
10
24. •
categories
• Less than 20 rows
• goods
• More than 1M rows
• 20 unique cat id values
• Many other fields
Price
Date: added, last updated, etc.
Characteristics
Store
...
Two Tables
12
26. • Select from the small table
Option 1: Select from the Small Table First
14
27. • Select from the small table
• For each cat id select from the large table
Option 1: Select from the Small Table First
14
28. • Select from the small table
• For each cat id select from the large table
• Filter result on date added[ and price[...]]
Option 1: Select from the Small Table First
14
29. • Select from the small table
• For each cat id select from the large table
• Filter result on date added[ and price[...]]
• Slow with many items in the category
Option 1: Select from the Small Table First
14
38. • Filter rows by date added[ and price[...]]
Option 2: Select From the Large Table First
16
39. • Filter rows by date added[ and price[...]]
• Get cat id values
Option 2: Select From the Large Table First
16
40. • Filter rows by date added[ and price[...]]
• Get cat id values
• Retrieve rows from the small table
Option 2: Select From the Large Table First
16
41. • Filter rows by date added[ and price[...]]
• Get cat id values
• Retrieve rows from the small table
• Slow if number of rows, filtered by
date added, is larger than number of goods in
the selected categories
Option 2: Select From the Large Table First
16
47. •
CREATE INDEX index everything
(cat id, date added[, price[, ...]])
• It resolves the issue
What if We use Combined Indexes?
18
48. •
CREATE INDEX index everything
(cat id, date added[, price[, ...]])
• It resolves the issue
• But not in all cases
What if We use Combined Indexes?
18
50. • Maintenance cost
•
Slower INSERT/UPDATE/DELETE
• Disk space
• Index not useful for selecting rows
JOIN categories ON (categories.id=goods.cat_id)
JOIN shops ON (shops.id=goods.shop_id)
[ JOIN ... ]
WHERE
date_added between ’2018-07-01’ and ’2018-08-01’
AND
cat_id in (16,11) AND price >= 1000 AND price <=10000 [ AND ... ]
GROUP BY product_type
ORDER BY date_updated DESC
LIMIT 50,100
The Problem
19
51. • Maintenance cost
•
Slower INSERT/UPDATE/DELETE
• Disk space
• Index not useful for selecting rows
• Tables may have wrong cardinality
The Problem
19
55. • Number of unique values in the index
• Optimizer uses for the query execution plan
Cardinality
23
56. • Number of unique values in the index
• Optimizer uses for the query execution plan
• Example
• ID: 1,2,3,4,5
•
Number of rows: 5
• Cardinality: 5
Cardinality
23
57. • Number of unique values in the index
• Optimizer uses for the query execution plan
• Example
• Gender: m,f,f,f,f,m,m,m,m,m,m,f,f,m,f,m,f
•
Number of rows: 17
• Cardinality: 2
Cardinality
23
58. • Stores statistics on disk
•
mysql.innodb table stats
•
mysql.innodb index stats
InnoDB: Overview
24
59. • Stores statistics on disk
• Returns statistics to Optimizer
InnoDB: Overview
24
60. • Stores statistics on disk
• Returns statistics to Optimizer
• In ha innobase::info
• handler/ha innodb.cc
InnoDB: Overview
24
61. • Stores statistics on disk
• Returns statistics to Optimizer
• In ha innobase::info
• handler/ha innodb.cc
•
When opens table
• flag = HA STATUS CONST
• Reads data from disk
•
Stores it in memory
InnoDB: Overview
24
62. • Stores statistics on disk
• Returns statistics to Optimizer
• In ha innobase::info
• handler/ha innodb.cc
•
When opens table
• Subsequent table accesses
• flag = HA STATUS VARIABLE
• Statistics from memory
•
Up to date Primary Key data
InnoDB: Overview
24
63. • Table created with option STATS AUTO RECALC = 0
• Before ANALYZE TABLE
mysql> show index from testG
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Seq_in_index: 1
Column_name: f1
Collation: A
Cardinality: 64
...
InnoDB: Flow
25
64. • Table created with option STATS AUTO RECALC = 0
• After ANALYZE TABLE
mysql> show index from testG
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Seq_in_index: 1
Column_name: f1
Collation: A
Cardinality: 2
...
InnoDB: Flow
25
65. • Table created with option STATS AUTO RECALC = 0
• After inserting rows
mysql> show index from testG
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Seq_in_index: 1
Column_name: f1
Collation: A
Cardinality: 16
...
InnoDB: Flow
25
66. • Table created with option STATS AUTO RECALC = 0
• After restart
mysql> show index from testG
...
*************************** 2. row ***************************
Table: test
Non_unique: 1
Key_name: f1
Seq_in_index: 1
Column_name: f1
Collation: A
Cardinality: 2
...
InnoDB: Flow
25
67. • Takes data from the engine
Optimizer: Overview
26
68. • Takes data from the engine
• Class ha statistics
•
sql/handler.h
Optimizer: Overview
26
69. • Takes data from the engine
• Class ha statistics
•
sql/handler.h
• Does not have Cardinality field at all
Optimizer: Overview
26
70. • Takes data from the engine
• Class ha statistics
•
sql/handler.h
• Does not have Cardinality field at all
• Uses formula to calculate Cardinality
Optimizer: Overview
26
71. • n rows: number of rows in the table
• Naturally up to date
• Constantly changing!
Optimizer: Formula
27
72. • n rows: number of rows in the table
• Naturally up to date
• Constantly changing!
• rec per key: number of duplicates per key
•
Calculated by InnoDB in time of ANALYZE
• rec per key = n rows / unique values
• Do not change!
Optimizer: Formula
27
73. • n rows: number of rows in the table
• Naturally up to date
• Constantly changing!
• rec per key: number of duplicates per key
•
Calculated by InnoDB in time of ANALYZE
• rec per key = n rows / unique values
• Do not change!
•
Cardinality = n rows / rec per key
Optimizer: Formula
27
74. • Engine stores persistent statistics
InnoDB
Storage Tables
Statistics As Calculated
Row Count Only in Memory
Persistent Statistics Are Not Persistent
28
75. • Engine stores persistent statistics
InnoDB
Storage Tables
Statistics As Calculated
Row Count Only in Memory
• Optimizer calculates Cardinality every time
when accesses engine statistics
Persistent Statistics Are Not Persistent
28
76. • Engine stores persistent statistics
InnoDB
Storage Tables
Statistics As Calculated
Row Count Only in Memory
• Optimizer calculates Cardinality every time
when accesses engine statistics
•
Weak user control
Persistent Statistics Are Not Persistent
28
78. • EXPLAIN without histograms
mysql> explain select goods.* from goods
-> join categories on (categories.id=goods.cat_id)
-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)
-> and
-> date_added between ’2000-01-01’ and ’2001-01-01’ -- Large range
-> order by goods.cat_id
-> limit 10G -- We ask for 10 rows only!
Example
30
79. • EXPLAIN without histograms
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: categories -- Small table first
partitions: NULL
type: index
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 20
filtered: 70.00
Extra: Using where; Using index;
Using temporary; Using filesort
Example
30
80. • EXPLAIN without histograms
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: goods -- Large table
partitions: NULL
type: ref
possible_keys: cat_id_2
key: cat_id_2
key_len: 5
ref: orig.categories.id
rows: 51827
filtered: 11.11 -- Default value
Extra: Using where
2 rows in set, 1 warning (0.01 sec)
Example
30
81. • Execution time without histograms
mysql> flush status;
Query OK, 0 rows affected (0.00 sec)
mysql> select goods.* from goods
-> join categories on (categories.id=goods.cat_id)
-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)
-> and
-> date_added between ’2000-01-01’ and ’2001-01-01’
-> order by goods.cat_id
-> limit 10;
ab9f9bb7bc4f357712ec34f067eda364 -
10 rows in set (56.47 sec)
Example
30
82. • Engine statistics without histograms
mysql> show status like ’Handler%’;
+----------------------------+--------+
| Variable_name | Value |
+----------------------------+--------+
...
| Handler_read_next | 964718 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 10 |
| Handler_read_rnd_next | 951671 |
...
| Handler_write | 951670 |
+----------------------------+--------+
18 rows in set (0.01 sec)
Example
30
83. • Now let add the histogram
mysql> analyze table goods update histogram on date_added;
+------------+-----------+----------+------------------------------+
| Table | Op | Msg_type | Msg_text |
+------------+-----------+----------+------------------------------+
| orig.goods | histogram | status | Histogram statistics created
for column ’date_added’. |
+------------+-----------+----------+------------------------------+
1 row in set (2.01 sec)
Example
30
84. • EXPLAIN with the histogram
mysql> explain select goods.* from goods
-> join categories
-> on (categories.id=goods.cat_id)
-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)
-> and
-> date_added between ’2000-01-01’ and ’2001-01-01’
-> order by goods.cat_id
-> limit 10G
Example
30
85. • EXPLAIN with the histogram
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: goods -- Large table first
partitions: NULL
type: index
possible_keys: cat_id_2
key: cat_id_2
key_len: 5
ref: NULL
rows: 10 -- Same as we asked
filtered: 98.70 -- True numbers
Extra: Using where
Example
30
86. • EXPLAIN with the histogram
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: categories -- Small table
partitions: NULL
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: orig.goods.cat_id
rows: 1
filtered: 100.00
Extra: Using index
2 rows in set, 1 warning (0.01 sec)
Example
30
87. • Execution time with the histogram
mysql> flush status;
Query OK, 0 rows affected (0.00 sec)
mysql> select goods.* from goods
-> join categories on (categories.id=goods.cat_id)
-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)
-> and
-> date_added between ’2000-01-01’ and ’2001-01-01’
-> order by goods.cat_id
-> limit 10;
eeb005fae0dd3441c5c380e1d87fee84 -
10 rows in set (0.00 sec) -- 56/0 times faster!
Example
30
96. • ANALYZE TABLE often
• Use large number of STATS SAMPLE PAGES
Solutions in 5.7-
38
97. • Counts number of pages in the table
How ANALYZE TABLE Works with InnoDB?
39
98. • Counts number of pages in the table
• Takes STATS SAMPLE PAGES
How ANALYZE TABLE Works with InnoDB?
39
99. • Counts number of pages in the table
• Takes STATS SAMPLE PAGES
• Counts number of unique values in secondary
index in these pages
How ANALYZE TABLE Works with InnoDB?
39
100. • Counts number of pages in the table
• Takes STATS SAMPLE PAGES
• Counts number of unique values in secondary
index in these pages
•
Divides number of pages in the table on
number of sample pages and multiplies result
by number of unique values
How ANALYZE TABLE Works with InnoDB?
39
101. • Number of pages in the table: 20,000
• STATS SAMPLE PAGES: 20 (default)
• Unique values in the secondary index:
• In sample pages: 10
• In the table: 11
Example
40
102. • Number of pages in the table: 20,000
• STATS SAMPLE PAGES: 20 (default)
• Unique values in the secondary index:
• In sample pages: 10
• In the table: 11
• Cardinality: 20,000 * 10 / 20 = 10,000
Example
40
103. • Number of pages in the table: 20,000
• STATS SAMPLE PAGES: 5,000
•
Unique values in the secondary index:
• In sample pages: 10
•
In the table: 11
• Cardinality: 20,000 * 10 / 5,000 = 40
Example 2
41
104. • Time consuming
mysql> select count(*) from goods;
+----------+
| count(*) |
+----------+
| 80303000 |
+----------+
1 row in set (35.95 sec)
Use Larger STATS SAMPLE PAGES?
42
105. • Time consuming
• With default STATS SAMPLE PAGES
mysql> analyze table goods;
+------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+------------+---------+----------+----------+
| test.goods | analyze | status | OK |
+------------+---------+----------+----------+
1 row in set (0.32 sec)
Use Larger STATS SAMPLE PAGES?
42
106. • Time consuming
• With bigger number
mysql> alter table goods STATS_SAMPLE_PAGES=5000;
Query OK, 0 rows affected (0.04 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> analyze table goods;
+------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+------------+---------+----------+----------+
| test.goods | analyze | status | OK |
+------------+---------+----------+----------+
1 row in set (27.13 sec)
Use Larger STATS SAMPLE PAGES?
42
107. • Time consuming
• With bigger number
• 27.13/0.32 = 85 times slower!
Use Larger STATS SAMPLE PAGES?
42
108. • Time consuming
• With bigger number
• 27.13/0.32 = 85 times slower!
•
Not always a solution
Use Larger STATS SAMPLE PAGES?
42
114. • Data Distribution: goods characteristics
mysql> select count(*) num_rows, good_id, manufacturer
-> from goods_characteristics group by good_id, manufacturer order by num_rows desc;
+----------+---------+--------------+
| num_rows | good_id | manufacturer |
+----------+---------+--------------+
| 65536 | laptop | Noname | | 8189 | laptop | Toshiba |
| 8191 | laptop | Samsung | | 8189 | laptop | Apple |
| 8191 | laptop | Acer | | 8189 | laptop | Asus |
| 8189 | laptop | Dell | | 10 | laptop | Sony |
| 8189 | laptop | HP | | 10 | laptop | Casper |
| 8189 | laptop | Lenovo | +----------+---------+--------------+
Two Similar Tables
44
115. • Data Distribution: goods shops
mysql> select count(*) num_rows, good_id, location
-> from goods_shops group by good_id, location order by num_rows desc;
+----------+---------+---------------+
| num_rows | good_id | location |
+----------+---------+---------------+
| 8191 | laptop | New York | | 8189 | laptop | Tokio |
| 8191 | laptop | San Francisco | | 8189 | laptop | Istanbul |
| 8189 | laptop | Paris | | 8189 | laptop | London |
| 8189 | laptop | Berlin | | 10 | laptop | Moscow |
| 8189 | laptop | Brussels | | 10 | laptop | Kiev |
+----------+---------+---------------+
Two Similar Tables
44
116. • Data Distribution: goods shops
mysql> select count(*) num_rows, good_id, delivery_options
-> from goods_shops group by good_id, delivery_options order by num_rows desc;
+----------+---------+------------------+
| num_rows | good_id | delivery_options |
+----------+---------+------------------+
| 8192 | laptop | DHL | | 8189 | laptop | Gruzovichkof |
| 8191 | laptop | PTT | | 8188 | laptop | Courier |
| 8190 | laptop | Normal Post | | 8187 | laptop | No delivery |
| 8190 | laptop | Tracked | | 10 | laptop | Premium |
| 8189 | laptop | Fedex | | 10 | laptop | Urgent |
+----------+---------+------------------+
Two Similar Tables
44
117. Histogram statistics are useful primarily for nonindexed columns. Adding an
index to a column for which histogram statistics are applicable might also help
the optimizer make row estimates. The tradeoffs are:
An index must be updated when table data is modified.
A histogram is created or updated only on demand, so it adds no overhead
when table data is modified. On the other hand, the statistics become progres-
sively more out of date when table modifications occur, until the next time they
are updated.
MySQL User Reference Manual
Optimizer Statistics aka Histograms
45
118. mysql> alter table goods_characteristics stats_sample_pages=5000;
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> alter table goods_shops stats_sample_pages=5000;
Query OK, 0 rows affected (0.05 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> analyze table goods_characteristics, goods_shops;
+----------------------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+----------------------------+---------+----------+----------+
| test.goods_characteristics | analyze | status | OK |
| test.goods_shops | analyze | status | OK |
+----------------------------+---------+----------+----------+
2 rows in set (0.35 sec)
Index Statistics is More than Good
46
119. • The query
mysql> select count(*) from goods_shops join goods_characteristics
-> using (good_id)
-> where size < 12 and
-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)
-> and (location in (’Moscow’, ’Kiev’) or
-> delivery_options in (’Premium’, ’Urgent’));
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
Performance
47
121. • Table order
mysql> explain select count(*) from goods_shops join goods_characteristics
-> using (good_id) where size < 12 and
-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)
-> and (location in (’Moscow’, ’Kiev’) or
-> delivery_options in (’Premium’, ’Urgent’));
+----+-----------------------+-------+---------+--------+----------+---------------+
| id | table | type | key | rows | filtered | Extra |
+----+-----------------------+-------+---------+--------+----------+---------------+
| 1 | goods_characteristics | index | good_id | 131072 | 25.00 | Using... |
| 1 | goods_shops | ref | good_id | 65536 | 36.00 | Using... |
+----+-----------------------+-------+---------+--------+----------+---------------+
2 rows in set, 1 warning (0.00 sec)
Performance
47
122. • Table order matters
mysql> explain select count(*) from goods_shops straight_join goods_characteristics
-> using (good_id) where size < 12 and
-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)
-> and (location in (’Moscow’, ’Kiev’) or
-> delivery_options in (’Premium’, ’Urgent’));
+----+-----------------------+-------+---------+--------+----------+---------------+
| id | table | type | key | rows | filtered | Extra |
+----+-----------------------+-------+---------+--------+----------+---------------+
| 1 | goods_shops | index | good_id | 65536 | 36.00 | Using... |
| 1 | goods_characteristics | ref | good_id | 131072 | 25.00 | Using... |
+----+-----------------------+-------+---------+--------+----------+---------------+
2 rows in set, 1 warning (0.00 sec)
Performance
47
123. • Table order matters
mysql> select count(*) from goods_shops straight_join goods_characteristics
-> using (good_id)
-> where size < 12 and
-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)
-> and (location in (’Moscow’, ’Kiev’) or
-> delivery_options in (’Premium’, ’Urgent’));
+----------+
| count(*) |
+----------+
| 816640 |
+----------+
1 row in set (2.11 sec)
Performance
47
124. • Table order matters
mysql> show status like ’Handler_read_next’;
+-------------------+-----------+
| Variable_name | Value |
+-------------------+-----------+
| Handler_read_next | 5,308,416 |
+-------------------+-----------+
1 row in set (0.00 sec)
Performance
47
125. • Not for all data
mysql> select count(*) from goods_shops straight_join goods_characteristics
-> using (good_id)
-> where (size > 15 or manufacturer in (’Sony’, ’Casper’))
-> and location in
-> (’New York’, ’San Francisco’, ’Paris’, ’Berlin’, ’Brussels’, ’London’)
-> and delivery_options in
-> (’DHL’,’Normal Post’, ’Tracked’, ’Fedex’, ’No delivery’);
^C^C -- query aborted
ERROR 1317 (70100): Query execution was interrupted
Performance
47
126. • Not for all data
mysql> show status like ’Handler%’;
+----------------------------+------------+
| Variable_name | Value |
+----------------------------+------------+
| Handler_commit | 10 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 28 |
| Handler_mrr_init | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 1 |
| Handler_read_key | 143 |
| Handler_read_last | 0 |
| Handler_read_next | 16,950,265 |
Performance
47
127. mysql> analyze table goods_shops update histogram
-> on location, delivery_options;
+-------------+-----------+----------+--------------------------------+
| Table | Op | Msg_type | Msg_text |
+-------------+-----------+----------+--------------------------------+
| goods_shops | histogram | status | Histogram statistics created
for column ’delivery_options’. |
| goods_shops | histogram | status | Histogram statistics created
for column ’location’. |
+-------------+-----------+----------+--------------------------------+
2 rows in set (0.18 sec)
Histograms to The Rescue
48
128. mysql> analyze table goods_characteristics update histogram
-> on size, manufacturer ;
+-----------------------+-----------+----------+------------------------------+
| Table | Op | Msg_type | Msg_text |
+-----------------------+-----------+----------+------------------------------+
| goods_characteristics | histogram | status | Histogram statistics created
for column ’manufacturer’. |
| goods_characteristics | histogram | status | Histogram statistics created
for column ’size’. |
+-----------------------+-----------+----------+------------------------------+
2 rows in set (0.23 sec)
Histograms to The Rescue
48
129. • The query
mysql> select count(*) from goods_shops join goods_characteristics
-> using (good_id)
-> where size < 12 and
-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)
-> and (location in (’Moscow’, ’Kiev’) or
-> delivery_options in (’Premium’, ’Urgent’));
+----------+
| count(*) |
+----------+
| 816640 |
+----------+
1 row in set (2.16 sec)
Histograms to The Rescue
48
130. • The query
mysql> show status like ’Handler_read_next’;
+-------------------+-----------+
| Variable_name | Value |
+-------------------+-----------+
| Handler_read_next | 5,308,418 |
+-------------------+-----------+
1 row in set (0.00 sec)
Histograms to The Rescue
48
131. • Filtering effect
mysql> explain select count(*) from goods_shops join goods_characteristics
-> using (good_id)
-> where size < 12 and
-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)
-> and (location in (’Moscow’, ’Kiev’) or
-> delivery_options in (’Premium’, ’Urgent’));
+----+-----------------------+-------+---------+--------+----------+----------+
| id | table | type | key | rows | filtered | Extra |
+----+-----------------------+-------+---------+--------+----------+----------+
| 1 | goods_shops | index | good_id | 65536 | 0.06 | Using... |
| 1 | goods_characteristics | ref | good_id | 131072 | 15.63 | Using... |
+----+-----------------------+-------+---------+--------+----------+----------+
2 rows in set, 1 warning (0.00 sec)
Histograms to The Rescue
48
137. ↓ sql/sql planner.cc
↓ calculate condition filter
↓ Item func *::get filtering effect
• get histogram selectivity
• Seen as a percent of filtered rows in EXPLAIN
Low Level
50
138. • Example data
mysql> create table example(f1 int) engine=innodb;
mysql> insert into example values(1),(1),(1),(2),(3);
mysql> select f1, count(f1) from example group by f1;
+------+-----------+
| f1 | count(f1) |
+------+-----------+
| 1 | 3 |
| 2 | 1 |
| 3 | 1 |
+------+-----------+
3 rows in set (0.00 sec)
Filtered Rows
51
139. • Without a histogram
mysql> explain select * from example where f1 > 0G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 33.33
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
140. • Without a histogram
mysql> explain select * from example where f1 > 1G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 33.33
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
141. • Without a histogram
mysql> explain select * from example where f1 > 2G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 33.33
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
142. • Without a histogram
mysql> explain select * from example where f1 > 3G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 33.33
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
143. • With the histogram
mysql> analyze table example update histogram on f1 with 3 buckets;
+-----------------+-----------+----------+------------------------------+
| Table | Op | Msg_type | Msg_text |
+-----------------+-----------+----------+------------------------------+
| hist_ex.example | histogram | status | Histogram statistics created
for column ’f1’. |
+-----------------+-----------+----------+------------------------------+
1 row in set (0.03 sec)
Filtered Rows
51
144. • With the histogram
mysql> select * from information_schema.column_statistics
-> where table_name=’example’G
*************************** 1. row ***************************
SCHEMA_NAME: hist_ex
TABLE_NAME: example
COLUMN_NAME: f1
HISTOGRAM:
"buckets": [[1, 0.6], [2, 0.8], [3, 1.0]],
"data-type": "int", "null-values": 0.0, "collation-id": 8,
"last-updated": "2018-11-07 09:07:19.791470",
"sampling-rate": 1.0, "histogram-type": "singleton",
"number-of-buckets-specified": 3
1 row in set (0.00 sec)
Filtered Rows
51
145. • With the histogram
mysql> explain select * from example where f1 > 0G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 100.00 -- all rows
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
146. • With the histogram
mysql> explain select * from example where f1 > 1G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 40.00 -- 2 rows
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
147. • With the histogram
mysql> explain select * from example where f1 > 2G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 20.00 -- one row
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
148. • With the histogram
mysql> explain select * from example where f1 > 3G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: example
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
filtered: 20.00 - one row
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Filtered Rows
51
153. •
CREATE INDEX
• Metadata lock
•
Can be blocked by any query
• UPDATE HISTOGRAM
• Backup lock
• Can be locked only by a backup
•
Can be created any time without fear
Maintenance: Locking
55
154. •
CREATE INDEX
• Locks writes
•
Locks reads ∗
PS-2503
Before Percona Server 5.6.38-83.0/5.7.20-18
Upstream
• Every DML updates the index
Maintenance: Load
56
155. •
CREATE INDEX
• Locks writes
•
Locks reads ∗
•
Every DML updates the index
•
UPDATE HISTOGRAM
• Uses up to histogram generation max mem size
•
Persistent after creation
• DML do not touch it
Maintenance: Load
56
156. • Helps if query plan can be changed
• Not a replacement for the index:
•
GROUP BY
• ORDER BY
• Query on a single table ∗
Histograms
57
157. • Data distribution is uniform
• Range optimization can be used
• Full table scan is fast
When Histogram are Not Helpful?
58
158. • Index statistics collected by the engine
• Optimizer calculates Cardinality each time
when it accesses statistics
•
Indexes don’t always improve performance
• Histograms can help
Still new feature
• Histograms do not replace other optimizations!
Conclusion
59
159. MySQL User Reference Manual
Blog by Erik Froseth
Blog by Frederic Descamps
Talk by Oystein Grovlen @Fosdem
Talk by Sergei Petrunia @PerconaLive
WL #8707
More information
60