The document discusses various query optimization techniques used in database management systems including MariaDB, MySQL, PostgreSQL, and SQL Server. Specifically, it covers the use of histograms to estimate query selectivity, derived table merging, condition pushdown including through window functions, and split grouping optimizations. Histograms help query planners estimate the number of rows filtered by query conditions. Derived table merging and condition pushdown help push conditions earlier in query execution. Split grouping allows computing groupings for a subset of rows instead of all rows.
Improving MariaDB’s Query Optimizer with better selectivity estimatesSergey Petrunya
The document discusses improving selectivity estimates in MariaDB's query optimizer. It begins with background on selectivity estimates and how the query optimizer uses statistics like cardinalities and selectivities. It then covers computing selectivity for local and join conditions, including techniques like histograms. The document discusses different types of histograms used in various databases and ongoing work in MariaDB to improve its histograms. It concludes with discussing computing selectivity for multiple conditions.
JSON Support in MariaDB: News, non-news and the bigger pictureSergey Petrunya
This document summarizes JSON support features in MariaDB, including JSON Path and JSON_TABLE. It discusses MariaDB and MySQL's implementation of the SQL:2016 JSON Path language, noting limitations compared to other databases. JSON_TABLE is explained as a way to convert JSON data to tabular form using column definitions. Examples are provided and features like handling nested paths and errors are covered. JSON support in MariaDB is still being developed to implement more of the standard and address current limitations.
- The document discusses histograms used for data statistics in MariaDB, MySQL, and PostgreSQL. Histograms provide compact summaries of column value distributions to help query optimizers estimate condition selectivities.
- MariaDB stores histograms in the mysql.column_stats table and collects them via full table scans. PostgreSQL collects histograms using random sampling and stores statistics in pg_stats including histograms and most common values lists.
- While both use height-balanced histograms, PostgreSQL additionally tracks most common values to improve selectivity estimates for frequent values.
This document discusses optimizations for views, derived tables, and common table expressions (CTEs) in MariaDB. It covers condition pushdown, which allows pushing conditions into derived tables and CTEs for more efficient execution. Later versions introduced additional optimizations like condition pushdown through window functions and splitting groupings for derived tables.
Optimizer features in recent releases of other databasesSergey Petrunya
The document summarizes several recent optimizer features introduced in MySQL 8.0 and PostgreSQL versions:
- MySQL 8.0 introduced an iterator-based executor, hash joins, EXPLAIN ANALYZE, and optimizations for anti-joins, semi-joins, and subqueries.
- PostgreSQL improved query parallelism, added multi-column statistics, parallel index creation, and optimized non-recursive common table expressions.
- Both databases have focused on join algorithms, statistics gathering, and parallel query processing to improve performance. MySQL continues to adopt features from other databases in recent releases.
The optimizer trace provides a detailed log of the actions taken by the query optimizer. It traces the major stages of query optimization including join preparation, join optimization, and join execution. During join optimization, it records steps like condition processing, determining table dependencies, estimating rows for plans, considering different execution plans, and choosing the best join order. The trace helps understand why certain query plans are chosen and catch differences in plans that may occur due to factors like database version changes.
Improving MariaDB’s Query Optimizer with better selectivity estimatesSergey Petrunya
The document discusses improving selectivity estimates in MariaDB's query optimizer. It begins with background on selectivity estimates and how the query optimizer uses statistics like cardinalities and selectivities. It then covers computing selectivity for local and join conditions, including techniques like histograms. The document discusses different types of histograms used in various databases and ongoing work in MariaDB to improve its histograms. It concludes with discussing computing selectivity for multiple conditions.
JSON Support in MariaDB: News, non-news and the bigger pictureSergey Petrunya
This document summarizes JSON support features in MariaDB, including JSON Path and JSON_TABLE. It discusses MariaDB and MySQL's implementation of the SQL:2016 JSON Path language, noting limitations compared to other databases. JSON_TABLE is explained as a way to convert JSON data to tabular form using column definitions. Examples are provided and features like handling nested paths and errors are covered. JSON support in MariaDB is still being developed to implement more of the standard and address current limitations.
- The document discusses histograms used for data statistics in MariaDB, MySQL, and PostgreSQL. Histograms provide compact summaries of column value distributions to help query optimizers estimate condition selectivities.
- MariaDB stores histograms in the mysql.column_stats table and collects them via full table scans. PostgreSQL collects histograms using random sampling and stores statistics in pg_stats including histograms and most common values lists.
- While both use height-balanced histograms, PostgreSQL additionally tracks most common values to improve selectivity estimates for frequent values.
This document discusses optimizations for views, derived tables, and common table expressions (CTEs) in MariaDB. It covers condition pushdown, which allows pushing conditions into derived tables and CTEs for more efficient execution. Later versions introduced additional optimizations like condition pushdown through window functions and splitting groupings for derived tables.
Optimizer features in recent releases of other databasesSergey Petrunya
The document summarizes several recent optimizer features introduced in MySQL 8.0 and PostgreSQL versions:
- MySQL 8.0 introduced an iterator-based executor, hash joins, EXPLAIN ANALYZE, and optimizations for anti-joins, semi-joins, and subqueries.
- PostgreSQL improved query parallelism, added multi-column statistics, parallel index creation, and optimized non-recursive common table expressions.
- Both databases have focused on join algorithms, statistics gathering, and parallel query processing to improve performance. MySQL continues to adopt features from other databases in recent releases.
The optimizer trace provides a detailed log of the actions taken by the query optimizer. It traces the major stages of query optimization including join preparation, join optimization, and join execution. During join optimization, it records steps like condition processing, determining table dependencies, estimating rows for plans, considering different execution plans, and choosing the best join order. The trace helps understand why certain query plans are chosen and catch differences in plans that may occur due to factors like database version changes.
The document summarizes new features in the query optimizer in MariaDB 10.4, including:
1) An optimizer trace that provides insight into the query planning process.
2) Using sampling for histogram collection during ANALYZE TABLE to improve performance.
3) Rowid filtering that pushes qualifying conditions into joins to filter out non-matching rows earlier.
4) Updated default settings that make better use of statistics and condition selectivity.
Using histograms to provide better query performance in MariaDB. Histograms capture the distribution of values in columns to help the query optimizer select better execution plans. The optimizer needs statistics on data distributions to estimate query costs accurately. Histograms are not enabled by default but can be collected using ANALYZE TABLE with the PERSISTENT option. Making histograms available improves the performance of queries that have selective filters or ordering on non-indexed columns.
EXPLAIN ANALYZE is a new query profiling tool first released in MySQL 8.0.18. This presentation covers how this new feature works, both on the surface and on the inside, and how you can use it to better understand your queries, to improve them and make them go faster.
This presentation is for everyone who has ever had to understand why a query is executed slower than anticipated, and for everyone who wants to learn more about query plans and query execution in MySQL.
The document discusses Cassandra Storage Engine (Cassandra SE) in MariaDB, which allows MariaDB to access Cassandra data. Cassandra SE provides a SQL view of Cassandra data by mapping Cassandra concepts like column families to MariaDB tables. It supports basic operations like SELECT, INSERT, and UPDATE between the two systems. The document outlines use cases, benchmarks Cassandra SE performance, and discusses future directions like supporting additional Cassandra features.
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Sergey Petrunya
- Common Table Expressions (CTEs) allow temporary named results to be stored and reused within a single SQL statement.
- There are two types of CTEs: non-recursive and recursive. Non-recursive CTEs can refer to other CTEs but not recursively, while recursive CTEs define a recursive relationship between rows.
- Optimizations for non-recursive CTEs in MariaDB include merging CTEs into joins and pushing conditions down into CTEs to filter groups. Recursive CTEs are used to query hierarchical or graph-like recursive relationships in a table.
MariaDB Optimizer - further down the rabbit holeSergey Petrunya
The document summarizes new features in the MariaDB 10.4 query optimizer including:
1) New default optimizer settings that take more factors into account for condition selectivity and use histograms by default.
2) Faster histogram collection using Bernoulli sampling rather than analyzing the whole data set.
3) Two new types of condition pushdown - from HAVING clauses into WHERE clauses, and into materialized IN subqueries.
Using Optimizer Hints to Improve MySQL Query Performanceoysteing
The document discusses using optimizer hints in MySQL to improve query performance. It covers index hints to influence which indexes the optimizer uses, join order hints to control join order, and subquery hints. New optimizer hints introduced in MySQL 5.7 and 8.0 are also presented, including hints for join strategies, materialized intermediate results, and query block naming. Examples are provided to illustrate how hints can be used and their behavior.
The document discusses window functions in MariaDB. It begins with an overview and plan, then covers basic window functions like row_number(), rank(), dense_rank(), and ntile(). It discusses frames for window functions, including examples using RANGE frames. It provides examples of problems that can be solved using window functions, such as smoothing noisy data, generating account balance statements, and finding sequences with no missing numbers ("islands").
This document summarizes an introduction to advanced MySQL query and schema tuning techniques presented by Alexander Rubin. It discusses how to identify and address slow queries through better indexing, temporary tables, and query optimization. Specific techniques covered include using indexes to optimize equality and range queries, ordering fields in composite indexes, and avoiding disk-based temporary tables for GROUP BY and other complex queries.
A Billion Goods in a Few Categories: When Optimizer Histograms Help and When ...Sveta Smirnova
Last year this session’s speaker worked on several cases where data followed the same pattern: millions of popular products fit into a couple of categories, and the rest uses the rest. Her team had a hard time finding a solution for retrieving goods quickly. MySQL 8.0 has a feature that resolves such issues: optimizer histograms, storing statistics of an exact number of values in each data bucket. In real life, histograms don’t help with all queries accessing nonuniform data. How you write a statement, the number of rows in the table, data distribution: All of these may affect the use of histograms. This presentation shows examples demonstrating how the optimizer works in each case, describes how to create histograms, and covers differences between MySQL and Oracle implementations.
Billion Goods in Few Categories: How Histograms Save a Life?Sveta Smirnova
We store data with an intention to use it: search, retrieve, group, sort... To do it effectively, the MySQL Optimizer uses index statistics when it compiles the query execution plan. This approach works excellently unless your data distribution is not even.
Last year I worked on several support tickets where data follows the same pattern: millions of popular products fit into a couple of categories and the rest used the rest. We had a hard time finding a solution for retrieving goods fast. We offered workarounds for version 5.7. However, a new MariaDB and MySQL 8.0 feature - histograms - would work better, cleaner and faster. The idea of the talk was born.
Of course, histograms are not a panacea and do not help in all situations.
I will discuss
- how index statistics physically stored by the storage engine
- which data exchanged with the Optimizer
- why it is not enough to make correct index choice
- when histograms can help and when they cannot
- differences between MySQL and MariaDB histograms
Talk for Percona Live 2019 Austin: https://www.percona.com/live/19/sessions/billion-goods-in-few-categories-how-histograms-save-a-life
- Common Table Expressions (CTEs) allow for temporary results to be stored and reused within the same SQL statement, similar to derived tables or views.
- CTEs can be non-recursive or recursive. Non-recursive CTEs are optimized by merging into joins or pushing conditions down, while recursive CTEs compute results through iterative steps until a fixed point is reached.
- The document discusses optimizations for non-recursive CTEs in MariaDB and provides examples of using CTEs for common queries involving things like hierarchical or network data.
Grouped data frames allow dplyr functions to manipulate each group separately. The group_by() function creates a grouped data frame, while ungroup() removes grouping. Summarise() applies summary functions to columns to create a new table, such as mean() or count(). Join functions combine tables by matching values. Left, right, inner, and full joins retain different combinations of values from the tables.
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...Kangaroot
Anders Karlsson, Principal Sales Engineer at MariaDB Corporation Ab
Join this session to learn more about all the new product features included in MariaDB Server 10.2.
After running over these new features, the presentation will cover MariaDB ColumnStore. MariaDB ColumnStore is a powerful open source columnar storage engine that supports a wide variety of analytical use cases with ANSI SQL in highly scalable distributed environments. It unifies OLTP and analytics workloads with a single ANSI SQL interface.
This document provides information on importing and working with different data types in R. It introduces packages for importing files like SPSS, Stata, SAS, Excel, databases, JSON, XML, and APIs. It also covers functions for reading and writing common file types like CSV, TSV, and RDS. Finally, it discusses parsing data and handling missing values when reading files.
The document discusses how the PostgreSQL query planner works. It explains that a query goes through several stages including parsing, rewriting, planning/optimizing, and execution. The optimizer or planner has to estimate things like the number of rows and cost to determine the most efficient query plan. Statistics collected by ANALYZE are used for these estimates but can sometimes be inaccurate, especially for n_distinct values. Increasing the default_statistics_target or overriding statistics on columns can help address underestimation issues. The document also discusses different plan types like joins, scans, and aggregates that the planner may choose between.
Window functions enable calculations across partitions of rows in a result set. This document discusses window function syntax, types of window functions available in MySQL 8.0 like RANK(), DENSE_RANK(), ROW_NUMBER(), and provides examples of queries using window functions to analyze and summarize data in partitions.
The document discusses PostgreSQL rules. It provides examples of creating rules to modify the behavior of INSERT, UPDATE, and SELECT statements. This includes using rules to log changes to a table to an audit table, and to upsert counts to increment a count column when a row already exists.
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013Sergey Petrunya
The document discusses techniques for identifying and addressing problems with a database query optimizer. It describes old and new tools for catching slow queries, such as the slow query log, SHOW PROCESSLIST, and the Performance Schema. It also provides examples of using these tools to analyze query plans, identify inefficient plans, and determine if optimizer settings or query structure need to be modified to address performance issues.
Abstract: A histogram represents the frequency of data distribution. This histogram can help in predicting a better execution plan. MariaDB and MySQL support HIstogram. Though they serve a common purpose they are implemented differently in MariaDB and MySQL, they have their control knobs. Our Consultants ( Monu Mahto and Madhavan ) articulate how histograms can be used in your production MySQL and MariaDB databases and how it helps in bringing down the execution time.
Statistis, Row Counts, Execution Plans and Query TuningGrant Fritchey
It’s fairly well known that the query optimizer is what creates execution plans. Lots of people are aware that execution plans are the thing that makes queries run fast, or slow. What seems to be less well known is just how vital the number of rows that the optimizer thinks may be returned by any given query is the primary factor driving the choices that the optimizer makes. This session focuses on how the row counts for queries are arrived at and how those row counts impact the choices made by the optimizer and, ultimately, the performance on your system. With the knowledge you gain from this session, you will make superior choices in writing T-SQL, creating indexes and maintaining your statistics. This leads to a better performing system. All thanks to counting the number of rows.
With the introduction of SQL Server 2012 data developers have new ways to interact with their databases. This session will review the powerful new analytic windows functions, new ways to generate numeric sequences and new ways to page the results of our queries. Other features that will be discussed are improvements in error handling and new parsing and concatenating features.
The document summarizes new features in the query optimizer in MariaDB 10.4, including:
1) An optimizer trace that provides insight into the query planning process.
2) Using sampling for histogram collection during ANALYZE TABLE to improve performance.
3) Rowid filtering that pushes qualifying conditions into joins to filter out non-matching rows earlier.
4) Updated default settings that make better use of statistics and condition selectivity.
Using histograms to provide better query performance in MariaDB. Histograms capture the distribution of values in columns to help the query optimizer select better execution plans. The optimizer needs statistics on data distributions to estimate query costs accurately. Histograms are not enabled by default but can be collected using ANALYZE TABLE with the PERSISTENT option. Making histograms available improves the performance of queries that have selective filters or ordering on non-indexed columns.
EXPLAIN ANALYZE is a new query profiling tool first released in MySQL 8.0.18. This presentation covers how this new feature works, both on the surface and on the inside, and how you can use it to better understand your queries, to improve them and make them go faster.
This presentation is for everyone who has ever had to understand why a query is executed slower than anticipated, and for everyone who wants to learn more about query plans and query execution in MySQL.
The document discusses Cassandra Storage Engine (Cassandra SE) in MariaDB, which allows MariaDB to access Cassandra data. Cassandra SE provides a SQL view of Cassandra data by mapping Cassandra concepts like column families to MariaDB tables. It supports basic operations like SELECT, INSERT, and UPDATE between the two systems. The document outlines use cases, benchmarks Cassandra SE performance, and discusses future directions like supporting additional Cassandra features.
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Sergey Petrunya
- Common Table Expressions (CTEs) allow temporary named results to be stored and reused within a single SQL statement.
- There are two types of CTEs: non-recursive and recursive. Non-recursive CTEs can refer to other CTEs but not recursively, while recursive CTEs define a recursive relationship between rows.
- Optimizations for non-recursive CTEs in MariaDB include merging CTEs into joins and pushing conditions down into CTEs to filter groups. Recursive CTEs are used to query hierarchical or graph-like recursive relationships in a table.
MariaDB Optimizer - further down the rabbit holeSergey Petrunya
The document summarizes new features in the MariaDB 10.4 query optimizer including:
1) New default optimizer settings that take more factors into account for condition selectivity and use histograms by default.
2) Faster histogram collection using Bernoulli sampling rather than analyzing the whole data set.
3) Two new types of condition pushdown - from HAVING clauses into WHERE clauses, and into materialized IN subqueries.
Using Optimizer Hints to Improve MySQL Query Performanceoysteing
The document discusses using optimizer hints in MySQL to improve query performance. It covers index hints to influence which indexes the optimizer uses, join order hints to control join order, and subquery hints. New optimizer hints introduced in MySQL 5.7 and 8.0 are also presented, including hints for join strategies, materialized intermediate results, and query block naming. Examples are provided to illustrate how hints can be used and their behavior.
The document discusses window functions in MariaDB. It begins with an overview and plan, then covers basic window functions like row_number(), rank(), dense_rank(), and ntile(). It discusses frames for window functions, including examples using RANGE frames. It provides examples of problems that can be solved using window functions, such as smoothing noisy data, generating account balance statements, and finding sequences with no missing numbers ("islands").
This document summarizes an introduction to advanced MySQL query and schema tuning techniques presented by Alexander Rubin. It discusses how to identify and address slow queries through better indexing, temporary tables, and query optimization. Specific techniques covered include using indexes to optimize equality and range queries, ordering fields in composite indexes, and avoiding disk-based temporary tables for GROUP BY and other complex queries.
A Billion Goods in a Few Categories: When Optimizer Histograms Help and When ...Sveta Smirnova
Last year this session’s speaker worked on several cases where data followed the same pattern: millions of popular products fit into a couple of categories, and the rest uses the rest. Her team had a hard time finding a solution for retrieving goods quickly. MySQL 8.0 has a feature that resolves such issues: optimizer histograms, storing statistics of an exact number of values in each data bucket. In real life, histograms don’t help with all queries accessing nonuniform data. How you write a statement, the number of rows in the table, data distribution: All of these may affect the use of histograms. This presentation shows examples demonstrating how the optimizer works in each case, describes how to create histograms, and covers differences between MySQL and Oracle implementations.
Billion Goods in Few Categories: How Histograms Save a Life?Sveta Smirnova
We store data with an intention to use it: search, retrieve, group, sort... To do it effectively, the MySQL Optimizer uses index statistics when it compiles the query execution plan. This approach works excellently unless your data distribution is not even.
Last year I worked on several support tickets where data follows the same pattern: millions of popular products fit into a couple of categories and the rest used the rest. We had a hard time finding a solution for retrieving goods fast. We offered workarounds for version 5.7. However, a new MariaDB and MySQL 8.0 feature - histograms - would work better, cleaner and faster. The idea of the talk was born.
Of course, histograms are not a panacea and do not help in all situations.
I will discuss
- how index statistics physically stored by the storage engine
- which data exchanged with the Optimizer
- why it is not enough to make correct index choice
- when histograms can help and when they cannot
- differences between MySQL and MariaDB histograms
Talk for Percona Live 2019 Austin: https://www.percona.com/live/19/sessions/billion-goods-in-few-categories-how-histograms-save-a-life
- Common Table Expressions (CTEs) allow for temporary results to be stored and reused within the same SQL statement, similar to derived tables or views.
- CTEs can be non-recursive or recursive. Non-recursive CTEs are optimized by merging into joins or pushing conditions down, while recursive CTEs compute results through iterative steps until a fixed point is reached.
- The document discusses optimizations for non-recursive CTEs in MariaDB and provides examples of using CTEs for common queries involving things like hierarchical or network data.
Grouped data frames allow dplyr functions to manipulate each group separately. The group_by() function creates a grouped data frame, while ungroup() removes grouping. Summarise() applies summary functions to columns to create a new table, such as mean() or count(). Join functions combine tables by matching values. Left, right, inner, and full joins retain different combinations of values from the tables.
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...Kangaroot
Anders Karlsson, Principal Sales Engineer at MariaDB Corporation Ab
Join this session to learn more about all the new product features included in MariaDB Server 10.2.
After running over these new features, the presentation will cover MariaDB ColumnStore. MariaDB ColumnStore is a powerful open source columnar storage engine that supports a wide variety of analytical use cases with ANSI SQL in highly scalable distributed environments. It unifies OLTP and analytics workloads with a single ANSI SQL interface.
This document provides information on importing and working with different data types in R. It introduces packages for importing files like SPSS, Stata, SAS, Excel, databases, JSON, XML, and APIs. It also covers functions for reading and writing common file types like CSV, TSV, and RDS. Finally, it discusses parsing data and handling missing values when reading files.
The document discusses how the PostgreSQL query planner works. It explains that a query goes through several stages including parsing, rewriting, planning/optimizing, and execution. The optimizer or planner has to estimate things like the number of rows and cost to determine the most efficient query plan. Statistics collected by ANALYZE are used for these estimates but can sometimes be inaccurate, especially for n_distinct values. Increasing the default_statistics_target or overriding statistics on columns can help address underestimation issues. The document also discusses different plan types like joins, scans, and aggregates that the planner may choose between.
Window functions enable calculations across partitions of rows in a result set. This document discusses window function syntax, types of window functions available in MySQL 8.0 like RANK(), DENSE_RANK(), ROW_NUMBER(), and provides examples of queries using window functions to analyze and summarize data in partitions.
The document discusses PostgreSQL rules. It provides examples of creating rules to modify the behavior of INSERT, UPDATE, and SELECT statements. This includes using rules to log changes to a table to an audit table, and to upsert counts to increment a count column when a row already exists.
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013Sergey Petrunya
The document discusses techniques for identifying and addressing problems with a database query optimizer. It describes old and new tools for catching slow queries, such as the slow query log, SHOW PROCESSLIST, and the Performance Schema. It also provides examples of using these tools to analyze query plans, identify inefficient plans, and determine if optimizer settings or query structure need to be modified to address performance issues.
Abstract: A histogram represents the frequency of data distribution. This histogram can help in predicting a better execution plan. MariaDB and MySQL support HIstogram. Though they serve a common purpose they are implemented differently in MariaDB and MySQL, they have their control knobs. Our Consultants ( Monu Mahto and Madhavan ) articulate how histograms can be used in your production MySQL and MariaDB databases and how it helps in bringing down the execution time.
Statistis, Row Counts, Execution Plans and Query TuningGrant Fritchey
It’s fairly well known that the query optimizer is what creates execution plans. Lots of people are aware that execution plans are the thing that makes queries run fast, or slow. What seems to be less well known is just how vital the number of rows that the optimizer thinks may be returned by any given query is the primary factor driving the choices that the optimizer makes. This session focuses on how the row counts for queries are arrived at and how those row counts impact the choices made by the optimizer and, ultimately, the performance on your system. With the knowledge you gain from this session, you will make superior choices in writing T-SQL, creating indexes and maintaining your statistics. This leads to a better performing system. All thanks to counting the number of rows.
With the introduction of SQL Server 2012 data developers have new ways to interact with their databases. This session will review the powerful new analytic windows functions, new ways to generate numeric sequences and new ways to page the results of our queries. Other features that will be discussed are improvements in error handling and new parsing and concatenating features.
QBIC is an experimental project that aims to speed up queries to databases using machine learning. It uses techniques like predicting data dependencies and pre-generating optimized query results ("pregens") to return data without needing to send the full query to the database. QBIC represents queries and their relationships as a lattice structure and applies algorithms like knapsack problems to determine the optimal pregens to maintain based on factors like space usage and query popularity. It also aims to predict metrics and cardinalities for queries to further improve performance through techniques like "sharpening" that can return approximated results without querying the database.
Learn best practices for taking advantage of Amazon Redshift's columnar technology and parallel processing capabilities to improve your data warehouse performance.
Introduction to MySQL Query Tuning for Dev[Op]sSveta Smirnova
To get data, we query the database. MySQL does its best to return requested bytes as fast as possible. However, it needs human help to identify what is important and should be accessed in the first place.
Queries, written smartly, can significantly outperform automatically generated ones. Indexes and Optimizer statistics, not limited to the Histograms only, help to increase the speed of the query a lot.
In this session, I will demonstrate by examples of how MySQL query performance can be improved. I will focus on techniques, accessible by Developers and DevOps rather on those which are usually used by Database Administrators. In the end, I will present troubleshooting tools which will help you to identify why your queries do not perform. Then you could use the knowledge from the beginning of the session to improve them.
MySQL: Know more about open Source DatabaseMahesh Salaria
- As a developer, it is important to understand MySQL's storage engines, data types, indexing, and normalization to build high-performing applications.
- MySQL has several storage engines that handle different table types differently in terms of transactions, locking, storage, and memory usage. Choosing the right engine depends on data usage.
- Properly normalizing data, using optimal data types, and adding indexes improves performance by reducing storage needs, memory usage, and speeding up queries.
Microsoft SQL Server Performance Query Tuning focuses on execution plans and indexes. Execution plans detail how queries will be processed, including index usage and join methods. Common elements include scans, seeks, lookups, nested loops, hash and merge joins, and aggregations. Indexes provide efficient access paths between users and data. Clustered indexes store data in sorted order while nonclustered indexes reference data locations. Tips include limiting indexes, avoiding updates in indexes, and creating indexes for query predicates.
SQL Server consists of several features including Query Analyzer, Profiler, and Service Manager. Profiler is a monitoring tool used for performance tuning that uses traces. Service Manager helps manage SQL Server instances. Each instance is hidden from others and has its own users, databases, and settings. BCP is a utility for bulk data transfer. Query Analyzer allows writing and executing queries. SQL Server databases contain objects like tables, views, stored procedures. System databases include master, model, msdb, and tempdb. Databases are created in master and contain data and log files. Select statements retrieve data using conditions and options. Data is inserted, updated, and deleted using statements. Joins combine data from multiple tables. Views store
MS SQL Server is a database server product of Microsoft that enables users to write and execute SQL queries and statements. It consists of several features like Query Analyzer, Profiler, and Service Manager. Profiler is a monitoring tool used for performance tuning. Service Manager helps manage SQL Server instances. Multiple instances can run on a single machine with each having independent users, databases, and settings. BCP is a command line utility that bulk copies data. Query Analyzer allows writing and executing SQL queries.
MS SQL Server is a database server product of Microsoft that allows users to write and execute SQL queries and statements. It consists of tools like Query Analyzer, Profiler, and Service Manager. Profiler is used for performance tuning. Service Manager helps manage SQL Server instances and databases can be created using the master database. SQL Server supports various data types, operators, and functions. Joins, indexes, views and other database objects are also supported to optimize queries and manage data.
In these slides we introduce Column-Oriented Stores. We deeply analyze Google BigTable. We discuss about features, data model, architecture, components and its implementation. In the second part we discuss all the major open source implementation for column-oriented databases.
This document provides an overview of performance tuning and indexing. It discusses indexing concepts like clustering factor and index data structures like B-trees. It also covers indexing strategies like reverse key indexes and the different types of histograms that can be created, including frequency, height-balanced, top frequency and hybrid histograms in 12c. The document concludes with discussing the basic statistics that are automatically collected on tables, columns and indexes to help with query optimization.
The document discusses SQL query performance analysis. It covers topics like the query optimizer, execution plans, statistics analysis, and different types of queries and scanning. The query optimizer is cost-based and determines the most efficient execution plan using cardinality estimates and cost models. Addhoc queries are non-parameterized queries that SQL Server treats differently than prepared queries. Execution plans show the steps and methods used to retrieve and process data. Statistics help the optimizer generate accurate cardinality estimates to pick high-performing plans.
Presentación que acompaña a la ponencia dada por Matt Savage, de PQ Systems, durante la conferencia LEAN Six Sigma Online 2014, organizada por Blackberry&Cross.
Más información: http://lssc.blackberrycross.com
The document provides an overview of MS SQL Server including its key features like Query Analyzer, Profiler, Service Manager, and Bulk Copy Program. It discusses instances, databases, database objects, joins, views, functions and sequences. The summary focuses on the high-level topics covered in the document.
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAmazon Web Services
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze big data for a fraction of the cost of traditional data warehouses. By following a few best practices, you can take advantage of Amazon Redshift’s columnar technology and parallel processing capabilities to minimize I/O and deliver high throughput and query performance. This webinar will cover techniques to load data efficiently, design optimal schemas, and tune query and database performance.
Learning Objectives:
Get an inside look at Amazon Redshift's columnar technology and parallel processing capabilities
Learn how to migrate from existing data warehouses, optimize schemas, and load data efficiently
Learn best practices for managing workload, tuning your queries, and using Amazon Redshift's interleaved sorting features
This document provides an overview of optimizing MySQL queries. It discusses optimization at the database and hardware levels, understanding query execution plans, using EXPLAIN to analyze queries, optimizing specific query types like counts and groups, indexing strategies like covering indexes, and partitioning tables for performance. The goal is to help readers write efficient queries and properly structure databases and indexes for high performance.
Part2 Best Practices for Managing Optimizer StatisticsMaria Colgan
Part 2 of the SQL Tuning workshop focuses on Optimizer Statistics and the best practices for managing them, including when and how to gather statistics. It also covers what additional information you may need to give the Optimizer and provides guidance on when not to gather statistics. Finally we look at all of the techniques you can use to speed up statistics gathering including taking advantage of Incremental statistics, parallelism and concurrency.
Similar to MariaDB 10.3 Optimizer - where does it stand (20)
New optimizer features in MariaDB releases before 10.12Sergey Petrunya
The document discusses new optimizer features in recent and upcoming MariaDB releases. MariaDB 10.8 introduced JSON histograms and support for reverse-ordered indexes. JSON produced by the optimizer is now valid and processible. MariaDB 10.9 added SHOW EXPLAIN FORMAT=JSON and SHOW ANALYZE can return partial results. MariaDB 10.10 enabled table elimination for derived tables and improved optimization of many-table joins. Future releases will optimize queries on stored procedures and show optimizer timing in EXPLAIN FORMAT=JSON.
MariaDB's join optimizer: how it works and current fixesSergey Petrunya
The document discusses improvements to MariaDB's join optimizer. It describes how the optimizer currently works, including join order search, pruning techniques, and greedy search. It then outlines several patches and improvements made to better prune join order search spaces and find optimal plans more quickly. This includes handling "edge tables", improving heuristics for key dependencies and model tables, pre-sorting tables during search, and exploring eq_ref chaining to further reduce search space for attribute tables.
This document discusses improvements to histograms in MariaDB. It provides background on how query optimizers use histograms to estimate condition selectivity. It describes the basic equi-width and improved equi-height histograms. It outlines how MariaDB 10.8 introduces a new JSON-based histogram type that stores exact bucket endpoints to improve accuracy, especially for popular values. The new type fixes issues the previous approaches had with inaccurate selectivity estimates for certain conditions. Overall, the document presents histograms as an important tool for query optimization and how MariaDB is enhancing their implementation.
This document discusses MyRocks, a storage engine for MariaDB that uses RocksDB as its backend. It begins by explaining the limitations of InnoDB that MyRocks aims to address, such as high write and space amplification. It then describes how RocksDB uses log-structured merge trees to reduce these issues. The document outlines how MyRocks implements the MySQL storage engine interface on top of RocksDB. It concludes by covering best practices for using MyRocks, including installation, migration, tuning for replication and backups.
This document discusses new query optimization features in MariaDB 10.3. It describes how MariaDB 10.3 improves on condition pushdown from 10.2 by allowing conditions to be pushed through window functions. It also explains a new "split grouping" optimization where grouping is done separately for each relevant group, rather than computing all groups at once, allowing indexes to be leveraged more efficiently. These optimizations can improve performance by filtering out unnecessary rows earlier in query execution.
The document discusses MyRocks being included in MariaDB. Some key points:
- MyRocks is a storage engine that combines RocksDB with MySQL/MariaDB for better performance.
- MyRocks is now included in MariaDB 10.2 as an alpha plugin, with binaries/packages available. Many features work but some like binlog/replication are still in progress.
- MariaDB will continue merging updates from the MyRocks upstream project and work to increase the plugin's maturity level.
- Future plans include finishing core features like binlog/replication support, packaging backup tools, and ensuring compatibility with MariaDB features like global variables and GTID replication.
This document provides an overview of MyRocks, a storage engine for MySQL/MariaDB that uses the RocksDB key-value store. It discusses the write amplification issues with InnoDB, how LSM trees in RocksDB address these issues through log-structured merging, and benchmarks showing the size, write amplification, and performance improvements MyRocks provides over InnoDB. It also outlines the process of integrating MyRocks into MariaDB, current status as an alpha plugin, and plans to improve support and testing.
This document discusses porting MyRocks, a storage engine that combines RocksDB with MySQL, to MariaDB. It provides an overview of MyRocks, the tasks involved in porting it to MariaDB, the current status, and future plans. Key points include porting MyRocks from Facebook's MySQL to MariaDB, building packages, releasing it as part of a MariaDB version, addressing failing tests and missing features, and improving integration with MariaDB capabilities like binlogging. The goal is to get MyRocks adopted more broadly by adding it to MariaDB and expanding the community around it.
MyRocks: табличный движок для MySQL на основе RocksDBSergey Petrunya
MyRocks: табличный движок для MySQL на основе RocksDB.
Презентация с HighLoad++ 2015.
Рассказывается о принципах работы LSM-Trees, их реализации в RocksDB, зачем и как был сделан MyRocks, с какими проблемами столкнулись и как их решили.
MariaDB: Engine Independent Table Statistics, including histogramsSergey Petrunya
EITS (Engine Independent Table Statistics) provides new statistics for MariaDB query optimization, including histograms for column value distributions. These statistics must be manually collected with ANALYZE TABLE and enabled to improve estimates for queries using non-indexed columns and ranges. With EITS, the optimizer can better estimate row counts and filtering for joins and WHERE clauses.
MariaDB: ANALYZE for statements (lightning talk)Sergey Petrunya
The document describes a new ANALYZE statement in MariaDB 10.1 that provides execution statistics for a SQL statement. ANALYZE runs the statement and collects statistics, similar to EXPLAIN ANALYZE in PostgreSQL. It produces an EXPLAIN plan with additional columns showing real rows, filtering percentages, and time spent. The FORMAT=JSON option outputs the results as a JSON document containing detailed timing and resource usage statistics for each step. This allows more complete analysis of how a query plan was executed versus global counters.
This document summarizes new features in the MariaDB 10.0 query optimizer, including:
1. Engine-independent statistics like histograms that are collected via ANALYZE TABLE instead of random sampling.
2. New subquery optimizations that convert EXISTS subqueries to inner joins and trivially correlated EXISTS to IN.
3. EXPLAIN improvements like SHOW EXPLAIN to see EXPLAIN plans for running queries, and logging EXPLAIN output in the slow query log.
How mysql handles ORDER BY, GROUP BY, and DISTINCTSergey Petrunya
This document discusses how MySQL handles ORDER BY, GROUP BY, and DISTINCT clauses. It describes the different strategies MySQL uses to produce ordered or grouped result sets, including using an index, filesort, or temporary table. It also covers some special cases and limitations. The goal is for the query optimizer to better account for sorting and grouping costs when choosing a query execution plan.
Consistent toolbox talks are critical for maintaining workplace safety, as they provide regular opportunities to address specific hazards and reinforce safe practices.
These brief, focused sessions ensure that safety is a continual conversation rather than a one-time event, which helps keep safety protocols fresh in employees' minds. Studies have shown that shorter, more frequent training sessions are more effective for retention and behavior change compared to longer, infrequent sessions.
Engaging workers regularly, toolbox talks promote a culture of safety, empower employees to voice concerns, and ultimately reduce the likelihood of accidents and injuries on site.
The traditional method of conducting safety talks with paper documents and lengthy meetings is not only time-consuming but also less effective. Manual tracking of attendance and compliance is prone to errors and inconsistencies, leading to gaps in safety communication and potential non-compliance with OSHA regulations. Switching to a digital solution like Safelyio offers significant advantages.
Safelyio automates the delivery and documentation of safety talks, ensuring consistency and accessibility. The microlearning approach breaks down complex safety protocols into manageable, bite-sized pieces, making it easier for employees to absorb and retain information.
This method minimizes disruptions to work schedules, eliminates the hassle of paperwork, and ensures that all safety communications are tracked and recorded accurately. Ultimately, using a digital platform like Safelyio enhances engagement, compliance, and overall safety performance on site. https://safelyio.com/
Preparing Non - Technical Founders for Engaging a Tech AgencyISH Technologies
Preparing non-technical founders before engaging a tech agency is crucial for the success of their projects. It starts with clearly defining their vision and goals, conducting thorough market research, and gaining a basic understanding of relevant technologies. Setting realistic expectations and preparing a detailed project brief are essential steps. Founders should select a tech agency with a proven track record and establish clear communication channels. Additionally, addressing legal and contractual considerations and planning for post-launch support are vital to ensure a smooth and successful collaboration. This preparation empowers non-technical founders to effectively communicate their needs and work seamlessly with their chosen tech agency.Visit our site to get more details about this. Contact us today www.ishtechnologies.com.au
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsPeter Muessig
The UI5 tooling is the development and build tooling of UI5. It is built in a modular and extensible way so that it can be easily extended by your needs. This session will showcase various tooling extensions which can boost your development experience by far so that you can really work offline, transpile your code in your project to use even newer versions of EcmaScript (than 2022 which is supported right now by the UI5 tooling), consume any npm package of your choice in your project, using different kind of proxies, and even stitching UI5 projects during development together to mimic your target environment.
Enhanced Screen Flows UI/UX using SLDS with Tom KittPeter Caitens
Join us for an engaging session led by Flow Champion, Tom Kitt. This session will dive into a technique of enhancing the user interfaces and user experiences within Screen Flows using the Salesforce Lightning Design System (SLDS). This technique uses Native functionality, with No Apex Code, No Custom Components and No Managed Packages required.
INTRODUCTION TO AI CLASSICAL THEORY TARGETED EXAMPLESanfaltahir1010
Image: Include an image that represents the concept of precision, such as a AI helix or a futuristic healthcare
setting.
Objective: Provide a foundational understanding of precision medicine and its departure from traditional
approaches
Role of theory: Discuss how genomics, the study of an organism's complete set of AI ,
plays a crucial role in precision medicine.
Customizing treatment plans: Highlight how genetic information is used to customize
treatment plans based on an individual's genetic makeup.
Examples: Provide real-world examples of successful application of AI such as genetic
therapies or targeted treatments.
Importance of molecular diagnostics: Explain the role of molecular diagnostics in identifying
molecular and genetic markers associated with diseases.
Biomarker testing: Showcase how biomarker testing aids in creating personalized treatment plans.
Content:
• Ethical issues: Examine ethical concerns related to precision medicine, such as privacy, consent, and
potential misuse of genetic information.
• Regulations and guidelines: Present examples of ethical guidelines and regulations in place to safeguard
patient rights.
• Visuals: Include images or icons representing ethical considerations.
Content:
• Ethical issues: Examine ethical concerns related to precision medicine, such as privacy, consent, and
potential misuse of genetic information.
• Regulations and guidelines: Present examples of ethical guidelines and regulations in place to safeguard
patient rights.
• Visuals: Include images or icons representing ethical considerations.
Content:
• Ethical issues: Examine ethical concerns related to precision medicine, such as privacy, consent, and
potential misuse of genetic information.
• Regulations and guidelines: Present examples of ethical guidelines and regulations in place to safeguard
patient rights.
• Visuals: Include images or icons representing ethical considerations.
Real-world case study: Present a detailed case study showcasing the success of precision
medicine in a specific medical scenario.
Patient's journey: Discuss the patient's journey, treatment plan, and outcomes.
Impact: Emphasize the transformative effect of precision medicine on the individual's
health.
Objective: Ground the presentation in a real-world example, highlighting the practical
application and success of precision medicine.
Data challenges: Address the challenges associated with managing large sets of patient data in precision
medicine.
Technological solutions: Discuss technological innovations and solutions for handling and analyzing vast
datasets.
Visuals: Include graphics representing data management challenges and technological solutions.
Objective: Acknowledge the data-related challenges in precision medicine and highlight innovative solutions.
Data challenges: Address the challenges associated with managing large sets of patient data in precision
medicine.
Technological solutions: Discuss technological innovations and solutions
8 Best Automated Android App Testing Tool and Framework in 2024.pdfkalichargn70th171
Regarding mobile operating systems, two major players dominate our thoughts: Android and iPhone. With Android leading the market, software development companies are focused on delivering apps compatible with this OS. Ensuring an app's functionality across various Android devices, OS versions, and hardware specifications is critical, making Android app testing essential.
Malibou Pitch Deck For Its €3M Seed Roundsjcobrien
French start-up Malibou raised a €3 million Seed Round to develop its payroll and human resources
management platform for VSEs and SMEs. The financing round was led by investors Breega, Y Combinator, and FCVC.
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...XfilesPro
Wondering how X-Sign gained popularity in a quick time span? This eSign functionality of XfilesPro DocuPrime has many advancements to offer for Salesforce users. Explore them now!
Measures in SQL (SIGMOD 2024, Santiago, Chile)Julian Hyde
SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context.
A talk at SIGMOD, June 9–15, 2024, Santiago, Chile
Authors: Julian Hyde (Google) and John Fremlin (Google)
https://doi.org/10.1145/3626246.3653374
WWDC 2024 Keynote Review: For CocoaCoders AustinPatrick Weigel
Overview of WWDC 2024 Keynote Address.
Covers: Apple Intelligence, iOS18, macOS Sequoia, iPadOS, watchOS, visionOS, and Apple TV+.
Understandable dialogue on Apple TV+
On-device app controlling AI.
Access to ChatGPT with a guest appearance by Chief Data Thief Sam Altman!
App Locking! iPhone Mirroring! And a Calculator!!
How Can Hiring A Mobile App Development Company Help Your Business Grow?ToXSL Technologies
ToXSL Technologies is an award-winning Mobile App Development Company in Dubai that helps businesses reshape their digital possibilities with custom app services. As a top app development company in Dubai, we offer highly engaging iOS & Android app solutions. https://rb.gy/necdnt
How Can Hiring A Mobile App Development Company Help Your Business Grow?
MariaDB 10.3 Optimizer - where does it stand
1. Santa Clara, California | April 23th – 25th, 2018
Sergey Petrunia MariaDB Project
Vicen iu Ciorbaru MariaDB Foundationț
Santa Clara, California | April 23th – 25th, 2018
Sergey Petrunia MariaDB Project
Vicen iu Ciorbaru MariaDB Foundationț
MariaDB Optimizer in 10.3,
where does it stand?
2. 2
Agenda
● New releases of MySQL and MariaDB
– MariaDB 10.2 and 10.3
– MySQL 8.0
● Optimizer related features
– Histograms
– Non-recursive CTEs
● Derived table optimizations
– Window Functions
● Let’s look and compare
– Also look at PostgreSQL and SQL Server
4. 4
Condition Selectivity
Query optimizer needs to decide on a plan to execute the query
Goal is to get the shortest running time
• Chose access method
- Index Access, Hash Join, BKA, etc.
• Choose correct join order to minimize the cost of reading rows
- Usually, minimizing rows read minimizes execution time
- Sometimes reading more rows is advantageous, if table / index is all in memory
5. 5
Condition Selectivity
Query optimizer needs to decide on a plan to execute the query
Goal is to get the shortest running time
• Chose access method
- Index Access, Hash Join, BKA, etc.
• Choose correct join order to minimize the cost of reading rows
- Usually, minimizing rows read minimizes execution time
- Sometimes reading more rows is advantageous, if table / index is all in memory
Use a cost model to estimate how long an execution plan would take
6. 6
Condition Selectivity
Query optimizer needs to decide on a plan to execute the query
Goal is to get the shortest running time
• Chose access method
- Index Access, Hash Join, BKA, etc.
• Choose correct join order to minimize the cost of reading rows
- Usually, minimizing rows read minimizes execution time
- Sometimes reading more rows is advantageous, if table / index is all in memory
Use a cost model to estimate how long an execution plan would take
For each condition in the where clause (and having) we compute
• Condition selectivity
- How many rows of the table is this condition going to accept? 10%, 20%, 90% ?
7. 7
Condition Selectivity
Query optimizer needs to decide on a plan to execute the query
Goal is to get the shortest running time
• Chose access method
- Index Access, Hash Join, BKA, etc.
• Choose correct join order to minimize the cost of reading rows
- Usually, minimizing rows read minimizes execution time
- Sometimes reading more rows is advantageous, if table / index is all in memory
Use a cost model to estimate how long an execution plan would take
For each condition in the where clause (and having) we compute
• Condition selectivity
- How many rows of the table is this condition going to accept? 10%, 20%, 90% ?
Getting the estimates
right is important!
9. 9
Condition Selectivity
Suppose we have query with 10 tables: T1, T2, T3, … T10
Query optimizer will:
• Estimate the number of rows that it will read from each table
• Based on the conditions in the where (and having) clause
10. 10
Condition Selectivity
Suppose we have query with 10 tables: T1, T2, T3, … T10
Query optimizer will:
• Estimate the number of rows that it will read from each table
• Based on the conditions in the where (and having) clauses
Assume estimates have an average error coefficient e
• Total number of estimated rows read is:
- (e * #T1) * (e * #T2) * (e * #T3) * … * (e * #T10)
• Where #T1..#T10 is the actual number of rows read for each table
11. 11
Condition Selectivity
Suppose we have query with 10 tables: T1, T2, T3, … T10
Query optimizer will:
• Estimate the number of rows that it will read from each table
• Based on the conditions in the where (and having) clauses
Assume estimates have an average error coefficient e
• Total number of estimated rows read is:
- (e * #T1) * (e * #T2) * (e * #T3) * … * (e * #T10)
• Where #T1..#T10 is the actual number of rows read for each table
The estimation error is amplified, the more tables there are in a join
• If we under/over estimate by a factor of 2 final error factor is 1024!
• If error is only 1.5 (off by 50%), final error factor is ~60
12. 12
Condition Selectivity
How does optimizer produce estimates?
• Condition analysis:
- Is it possible to satisfy conditions? t1.a > 10 and t1.a < 5
- Equality condition on a distinct column?
• Index dives to get number of rows in a range
• Guesstimates (MySQL)
• Histograms for non-indexed columns
14. 14
Histograms estimate a distribution
Multiple types of histograms
• Equi-Width Histograms
Histograms
15. 15
Histograms estimate a distribution
Multiple types of histograms
• Equi-Width Histograms
- Not uniform information
- Many values in one bucket (5)
- Other buckets take few values (1)
Histograms
16. 16
Histograms estimate a distribution
Multiple types of histograms
• Equi-Width Histograms
- Not uniform information
- Many values in one bucket (5)
- Other buckets take few values (1)
Histograms
17. 17
Histograms estimate a distribution
Multiple types of histograms
• Equi-Width Histograms
- Not uniform information
- Many values in one bucket (5)
- Other buckets take few values (1)
• Equi-Height Histograms
- All bins have same #values
- More bins where there are more
Values
Histograms
18. 18
Histograms estimate a distribution
Multiple types of histograms
• Equi-Width Histograms
- Not uniform information
- Many values in one bucket (5)
- Other buckets take few values (1)
• Equi-Height Histograms
- All bins have same #values
- More bins where there are more
Values
• Most Common Values Histograms
- Useful for ENUM columns
- One bin per value
Histograms
19. 19
Histograms in MariaDB
MariaDB histograms are collected by doing a full table scan
• Needs to be done manually using ANALYZE TABLE … PERSISTENT
Stored inside
• mysql.table_stats, mysql.column_stats, mysql.index_stats
• As a binary value (max 255 bytes), single / double precision
• Special function to decode, decode_histogram()
Can be manually updated
• One can run data collection on a slave, then propagate results
Not enabled by default, needs a few switches turned on to work
20. 20
Histograms in MySQL
MySQL histograms are collected by doing a full table scan
• Needs to be done manually using ANALYZE TABLE … UPDATE HISTOGRAM
• Can collect all data or perform sampling by skipping rows, based on max memory
allocation
Stored inside data dictionary
• Can be viewed through INFORMATION_SCHEMA.column_statistics
• Stored as Equi-Width (Singleton) or Equi-Height
• Visible as JSON
Can not be manually updated
• No obvious easy way to share statistics
Enabled by default, will be used when available
21. 21
Histograms in PostgreSQL
PostgreSQL histograms are collected by doing a true random read
• Can be collected manually with ANALYZE
• Also collected automatically when VACUUM runs
Stores equal-height and most common values at the same time
• Equal-height histogram doesn’t cover MCV
Can be manually updated
• One could import histograms from slave instances
• VACUUM auto-collection seems to cover the use case
22. 22
Using Histograms
Histograms are useful for range conditions
• Equi-width or equi-height:
- COLUMN > constant
• Most Common Values (Singleton):
- COLUMN = constant
Problematic when multiple columns are involved:
• t1.COL1 > 100 AND t1.COL2 > 1000
Most optimizers assume column values are independent
• P(A ∩ B) = P(A) * P(B) vs P(A ∩ B) = P(A) * P(B | A)
PostgreSQL 10 has added support for multi-variable distributions.
MySQL assumes independent values.
MariaDB doesn’t handle multi-variable case well either.
23. 23
Using Histograms
Sample database world:
select city.name
from city
where (city.population > 10 mil or
city.population < 10 thousand)
MariaDB MySQL PostgreSQL
Estimated Rows Filtered 1.95% 1.09% 1.05%
Actual Rows Filtered 1.05 %
24. 24
Using Histograms
Table with 2 columns A and B
• t1.a always equals t1.b
• 10 distinct values, each value occurs with 10% probability
select t1.A, t1.B
from t1
where t1.A = t1.B and t1.A = 5
MariaDB MySQL PostgreSQL
Estimated Rows Filtered 1.03% 1% 10%
Actual Rows Filtered 10%
25. 25
Conclusions
MariaDB
• Slightly less precise than MySQL, but smaller in size
• Same problem with correlated data as MySQL
• Performs full-table-scan, no sampling support
• Easy to share between instances
MySQL
• Histograms provide good estimates for real world data
• Poor performance with highly correlated data
• Performs full-table-scan, supports sampling
PostgreSQL
• Estimates on par with MySQL and MariaDB
• Support for multi-variable distributions!
• True sampling
27. 27
A set of related optimizations
Some are new, some are old:
● Derived table merge
● Condition pushdown
– Condition pushdown through window functions
● GROUP BY splitting
28. 28
Background – derived table merge
● “VIP customers and their big orders from October”
select *
from
vip_customer,
(select *
from orders
where order_date BETWEEN '2017-10-01' and '2017-10-31'
) as OCT_ORDERS
where
OCT_ORDERS.amount > 1M and
OCT_ORDERS.customer_id = customer.customer_id
29. 29
Naive execution
select *
from
vip_customer,
(select *
from orders
where
order_date BETWEEN '2017-10-01' and
'2017-10-31'
) as OCT_ORDERS
where
OCT_ORDERS.amount > 1M and
OCT_ORDERS.customer_id =
vip_customer.customer_id
orders
vip_customer
1 – compute
oct_orders
2- do join OCT_ORDERS
amount > 1M
30. 30
Derived table merge
select *
from
vip_customer,
(select *
from orders
where
order_date BETWEEN '2017-10-01' and
'2017-10-31'
) as OCT_ORDERS
where
OCT_ORDERS.amount > 1M and
OCT_ORDERS.customer_id =
vip_customer.customer_id
select *
from
vip_customer,
orders
where
order_date BETWEEN '2017-10-01' and
'2017-10-31'
and
orders.amount > 1M and
orders.customer_id =
vip_customer.customer_id
31. 31
Execution after merge
vip_customer
Join
orders
select *
from
vip_customer,
orders
where
order_date BETWEEN '2017-10-01' and
'2017-10-31'
and
orders.amount > 1M and
orders.customer_id =
vip_customer.customer_id
Made in October
amount > 1M
● Allows the optimizer to join customer→orders or orders→customer
● Good for optimization
32. 32
What if the subquery has a GROUP BY ?
● Merging is only possible when the “final” operation of the subquery is a join
● Can’t merge if it’s a GROUP BY/DISTINCT/ORDER BY LIMIT/etc
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
select * from OCT_TOTALS where customer_id=1
33. 33
Execution is inefficient
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
select * from OCT_TOTALS where customer_id=1
orders
1 – compute all totals
2- get customer=1
OCT_TOTALS
customer_id=1
Sum
34. 34
Condition pushdown optimization
select *
from OCT_TOTALS
where customer_id=1
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
● Can push down conditions on GROUP
BY columns
● … to filter out rows that go into groups
we don’t care about
35. 35
Condition pushdown
select *
from OCT_TOTALS
where customer_id=1
orders
1 – find customer_id=1
OCT_TOTALS,
customer_id=1
customer_id=1
Sum
● Looking only at groups you’re interested in is much more efficient
– Pushing into HAVING clause is useful, too.
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
orders
36. 36
Pushdown for inferred conditions (in MariaDB)
select
customer.customer_name,
TOTAL_AMT
from
customer, OCT_TOTALS
where
customer.customer_id=OCT_TOTALS.customer_id and
customer.customer_id=1
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
OCT_TOTALS.customer_id=1
37. 37
Condition Pushdown through Window Functions
● “Customer’s biggest orders”
create view top_three_orders as
select *
from
(
select
customer_id,
amount,
rank() over (partition by customer_id
order by amount desc
) as order_rank
from orders
) as ordered_orders
where order_rank<3
select * from top_three_orders where customer_id=1
+-------------+--------+------------+
| customer_id | amount | order_rank |
+-------------+--------+------------+
| 1 | 10000 | 1 |
| 1 | 9500 | 2 |
| 1 | 400 | 3 |
| 2 | 3200 | 1 |
| 2 | 1000 | 2 |
| 2 | 400 | 3 |
...
38. 38
Condition pushdown through Window Functions
Without condition pushdown
● Compute top_three_orders
for all customers
● select rows with
customer_id=1
select * from top_three_orders where customer_id=1
With condition pushdown
● Only compute top_three_orders
for customer_id=1
– This is much faster
– Can take advantage of
index on customer_id
39. 39
Summary so far
● Derived table merge
– Available since MySQL/MariaDB 5.1 and in most other databases
● Condition pushdown
– Available in PostgreSQL, MariaDB 10.2
– Not available in MySQL 5.7 or 8.0
– Limitations:
● MariaDB doesn’t push from HAVING into WHERE (MDEV-7486)
● PostgreSQL doesn’t push inferred conditions
● Condition pushdown through window functions
– Available in PostgreSQL, MariaDB 10.3
41. 41
Split grouping use case
select *
from
customer, OCT_TOTALS
where
customer.customer_id=OCT_TOTALS.customer_id and
customer.customer_name IN ('Customer 1', 'Customer 2')
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
● Compute a table of groups
(OCT_TOTALS)
● Join the groups to another
table (customer)
● The other table has a
selective restriction (only
need two customers)
● But condition pushdown can’t
be used
42. 42
Execution, the old way
Sum
orders
select *
from
customer, OCT_TOTALS
where
customer.customer_id=
OCT_TOTALS.customer_id and
customer.customer_name IN ('Customer 1',
'Customer 2')
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
Customer 1
Customer 2
Customer 3
Customer 100
Customer 1
Customer 2
Customer 3
Customer 100
customer
Customer 1
Customer 2
OCT_TOTALS
● Inefficient, OCT_TOTALS
is computed for *all*
customers.
43. 43
Split grouping execution (1)
Sum
customer
Customer 1
Customer 100
orders
Customer 1
Customer 1 Sum
● Similar to “LATERAL DERIVED”
● Pick Customer1, compute part of
OCT_TOTALS table for him.
44. 44
Split grouping execution (2)
Sum
customer
Customer 2
Customer 2
Customer 1
Customer 100
orders
Customer 1
Customer 1
Customer 2
Sum
SumSum
● Similar to “LATERAL DERIVED”
● Pick Customer1, compute part of
OCT_TOTALS table for him
● Pick Customer2, compute part of
OCT_TOTALS table for him
45. 45
Split grouping execution (3)
Sum
customer
Customer 2
Customer 2
Customer 1
Customer 100
orders
Customer 1
Customer 1
Customer 2
Sum
SumSum
● Similar to “LATERAL DERIVED”
● Pick Customer1, compute part of
OCT_TOTALS table for him
● Pick Customer2, compute part of
OCT_TOTALS table for him
● ...
46. 46
Split Grouping prerequisites
Sum
customer
Customer 2
Customer 2
Customer 1
Customer 100
orders
Customer 1
Customer 1
Customer 2
Sum
SumSum
● There is a join condition that “selects” one
GROUP BY group:
– OCT_TOTALS.customer_id=
customer.customer_id
● The join order allows to make “lookups” in the
grouped temp table
– customer→ OCT_TOTALS
● There is an index that allows to read only one
GROUP BY group.
– INDEX(orders.customer_id)
OCT_TOTALS
47. 47
Split grouping execution
● Available since MariaDB 10.3
● The optimizer makes a critera + cost-based choice whether to use the optimization
● EXPLAIN shows “LATERAL DERIVED”
● @@optimizer_switch flag: split_materialization (ON by default)
select *
from
customer, OCT_TOTALS
where
customer.customer_id=
OCT_TOTALS.customer_id and
customer.customer_name IN ('Customer 1',
'Customer 2')
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
+------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+
| 1 | PRIMARY | customer | ALL | PRIMARY | NULL | NULL | NULL | 1000 | |
| 1 | PRIMARY | <derived2> | ref | key0 | key0 | 4 | customer.customer_id | 36 | |
| 2 | LATERAL DERIVED | orders | ref | customer_id | customer_id | 4 | customer.customer_id | 365 | Using where |
+------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+
48. 48
Summary so far
● Derived table merge
– Available since MySQL/MariaDB 5.1 and in most other databases
● Condition pushdown
– Available in PostgreSQL, MariaDB 10.2
– Not available in MySQL 5.7 or 8.0
● Condition pushdown through window functions
– Available in PostgreSQL, MariaDB 10.3
● Split grouping optimization
– MariaDB 10.3 only
50. 50
CTE name
CTE Body
CTE Usage
with engineers as (
select *
from employees
where dept='Engineering'
)
select *
from engineers
where ...
WITH
CTE syntax
Similar to DERIVED
tables
“Query-local VIEWs”
51. 51
select *
from
(
select *
from employees
where
dept='Engineering'
) as engineers
where
...
with engineers as (
select *
from employees
where dept='Engineering'
)
select *
from engineers
where
...
CTEs are like derived tables
52. 52
with engineers as (
select * from employees
where dept in ('Development','Support')
),
eu_engineers as (
select * from engineers where country IN ('NL',...)
)
select
...
from
eu_engineers;
Use case #1: CTEs refer to CTEs
More readable than nested FROM(SELECT …)
53. 53
with engineers as (
select * from employees
where dept in ('Development','Support')
),
select *
from
engineers E1
where not exists (select 1
from engineers E2
where E2.country=E1.country
and E2.name <> E1.name);
Use case #2: Multiple uses of CTE
Anti-self-join
54. 54
select *
from
sales_product_year CUR,
sales_product_year PREV,
where
CUR.product=PREV.product and
CUR.year=PREV.year + 1 and
CUR.total_amt > PREV.total_amt
with sales_product_year as (
select
product,
year(ship_date) as year,
sum(price) as total_amt
from
item_sales
group by
product, year
)
Use case #2: example 2
Year-over-year comparisons
55. 55
Optimizations for non-recursive CTEs
1. The same set as for derived tables
– Merge
– Condition pushdown
● through window functions
– Lateral derived
2. Compute CTE once if it is used multiple times
56. 56
Merge
Condition
pushdown
Lateral
derived
CTE
reuse
MariaDB 10.3 ✔ ✔ ✔ ✘
MS SQL Server ✔ ✔ ? ✘
PostgreSQL ✘ ✘ ✘ ✔
MySQL 8.0 ✔ ✘ ✘ ✔
CTE Optimizations
Merge and Condition Pushdown are the most important
MariaDB supports them, like MS SQL.
PostgreSQL’s approach is *weird*
“CTEs are optimization barriers”
MySQL 8.0: “try merging, otherwise reuse”
58. 58
Window functions optimizations
● Window functions introduced in
– MariaDB 10.2
– MySQL 8.0
● Optimizations for window functions
– Condition pushdown
– Reduce the number of sorting passes
– Streamed computation
– ORDER BY-like optimizations
59. 59
Reduce the number of sorting passes
tbl
tbl
tbl
join
sort
select
rank() over (order by col1),
ntile(4)over (order by col2),
rank() over (order by ...),
from
tbl1
join tbl2 on ...
● Each window function requires a sort
● Identical PARTITION/ORDER BY must share the sort step
● Compatible may share the sort step
● Supported by all: MariaDB, MySQL 8, PostgreSQL, ...
compute
window
function
60. 60
Streamed computation
win_func( )
over (partition by ...
order by ...
rows between preceding N1
and following N2)
● Window function is computed from rows in the window
frame
– O (n_rows * frame_size)
● Frame moves down with the current row
● For most functions, one can update the value after the
frame has moved – this is streamed computation
– SUM, COUNT, AVG
● For some, this doesn’t hold (e.g. MAX)
old_val
new_val
cur_row
61. 61
ORDER BY [LIMIT] like optimizations
● Skip sorting if the rows come already sorted
● ORDER BY … LIMIT and descending window function
select
row_number() over (...) as RN
from
...
order by RN limit 10
● Restriction on ROW_NUMBER
select *
from (select row_number() over (...) as RN
from ...
) as T
where RN < 10
62. 62
Window functions optimization summary
Reuse
compatible
sorts
Streamed
computation
Condition
pushdown
ORDER BY
LIMIT-like
optimizations
MariaDB 10.3 ✔ ~✔ ✔ ✘
MS SQL Server ✔ ~✔ ✔ ✔
PostgreSQL ✔ ~✔ ✔ ✘
MySQL 8.0 ✔ ~✔ ✘ ✘
Everyone
has this since
it’s mandatory
for identical
sorts
Essential,
otherwise
O(N) computation
becomes O(N^2)
Very nice to
have for
analytic queries
Sometimes used for
TOP-n queries by
those with “big
database” background
63. 63
Summary
● Both MariaDB and MySQL now have histograms
– MySQL’s are larger and more precise
– Both are lagging behind PostgreSQL, still
● Derived tables: MariaDB got condition pushdown
– MariaDB 10.3: Pushdown for window functions, Split grouping
– Caught up with PostgreSQL and exceeded it.
● Non-recursive CTEs
– See derived tables
– PostgreSQL and MySQL 8 have made weird choice
● Window functions
– Similar optimizations in all three
– MySQL lacks condition pushdown (careful with VIEWs).