In this presentation I am illustrating how and why InnodDB perform Merge and Split pages. I will also show what are the possible things to do to reduce the impact.
The document discusses InnoDB flushing and checkpoints. It provides an overview of InnoDB architecture and describes the page cleaner thread that handles background flushing. The page cleaner thread coordinates multiple worker threads to flush pages from the LRU and flush lists. Flushing involves writing dirty pages from the buffer pool to disk in the background to avoid needing synchronous I/O.
The document summarizes a presentation on the internals of InnoDB file formats and source code structure. The presentation covers the goals of InnoDB being optimized for online transaction processing (OLTP) with performance, reliability, and scalability. It describes the InnoDB architecture, on-disk file formats including tablespaces, pages, rows, and indexes. It also discusses the source code structure.
M|18 Deep Dive: InnoDB Transactions and Write PathsMariaDB plc
The document discusses the write path for transactions in InnoDB from the client connection to physical storage. It compares InnoDB's transaction and storage layers to the OSI model. Key aspects covered include how SQL statements are executed, how rows are locked, written to indexes and undo logs, and how transactions are committed or rolled back. Mini-transactions provide atomic durable changes to multiple pages using write-ahead logging to the redo log.
The document discusses secondary index searches in MySQL. It describes the process as starting with a search of the secondary index tree to find the primary key. The primary key is then added to an unsorted list. Once all secondary index searches are complete, the primary key list is sorted. The primary index tree is then searched sequentially using the sorted primary key list to retrieve the clustered data records. Finally, the clustered data records are accessed sequentially.
The presentation covers improvements made to the redo logs in MySQL 8.0 and their impact on the MySQL performance and Operations. This covers the MySQL version still MySQL 8.0.30.
Parquet performance tuning: the missing guideRyan Blue
Parquet performance tuning focuses on optimizing Parquet reads by leveraging columnar organization, encoding, and filtering techniques. Statistics and dictionary filtering can eliminate unnecessary data reads by filtering at the row group and page levels. However, these optimizations require columns to be sorted and fully dictionary encoded within files. Increasing dictionary size thresholds and decreasing row group sizes can help avoid dictionary encoding fallback and improve filtering effectiveness. Future work may include new encodings, compression algorithms like Brotli, and page-level filtering in the Parquet format.
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)Jean-François Gagné
To get better replication speed and less lag, MySQL implements parallel replication in the same schema, also known as LOGICAL_CLOCK. But fully benefiting from this feature is not as simple as just enabling it.
In this talk, I explain in detail how this feature works. I also cover how to optimize parallel replication and the improvements made in MySQL 8.0 and back-ported in 5.7 (Write Sets), greatly improving the potential for parallel execution on replicas (but needing RBR).
Come to this talk to get all the details about MySQL 5.7 and 8.0 Parallel Replication.
The document discusses InnoDB flushing and checkpoints. It provides an overview of InnoDB architecture and describes the page cleaner thread that handles background flushing. The page cleaner thread coordinates multiple worker threads to flush pages from the LRU and flush lists. Flushing involves writing dirty pages from the buffer pool to disk in the background to avoid needing synchronous I/O.
The document summarizes a presentation on the internals of InnoDB file formats and source code structure. The presentation covers the goals of InnoDB being optimized for online transaction processing (OLTP) with performance, reliability, and scalability. It describes the InnoDB architecture, on-disk file formats including tablespaces, pages, rows, and indexes. It also discusses the source code structure.
M|18 Deep Dive: InnoDB Transactions and Write PathsMariaDB plc
The document discusses the write path for transactions in InnoDB from the client connection to physical storage. It compares InnoDB's transaction and storage layers to the OSI model. Key aspects covered include how SQL statements are executed, how rows are locked, written to indexes and undo logs, and how transactions are committed or rolled back. Mini-transactions provide atomic durable changes to multiple pages using write-ahead logging to the redo log.
The document discusses secondary index searches in MySQL. It describes the process as starting with a search of the secondary index tree to find the primary key. The primary key is then added to an unsorted list. Once all secondary index searches are complete, the primary key list is sorted. The primary index tree is then searched sequentially using the sorted primary key list to retrieve the clustered data records. Finally, the clustered data records are accessed sequentially.
The presentation covers improvements made to the redo logs in MySQL 8.0 and their impact on the MySQL performance and Operations. This covers the MySQL version still MySQL 8.0.30.
Parquet performance tuning: the missing guideRyan Blue
Parquet performance tuning focuses on optimizing Parquet reads by leveraging columnar organization, encoding, and filtering techniques. Statistics and dictionary filtering can eliminate unnecessary data reads by filtering at the row group and page levels. However, these optimizations require columns to be sorted and fully dictionary encoded within files. Increasing dictionary size thresholds and decreasing row group sizes can help avoid dictionary encoding fallback and improve filtering effectiveness. Future work may include new encodings, compression algorithms like Brotli, and page-level filtering in the Parquet format.
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)Jean-François Gagné
To get better replication speed and less lag, MySQL implements parallel replication in the same schema, also known as LOGICAL_CLOCK. But fully benefiting from this feature is not as simple as just enabling it.
In this talk, I explain in detail how this feature works. I also cover how to optimize parallel replication and the improvements made in MySQL 8.0 and back-ported in 5.7 (Write Sets), greatly improving the potential for parallel execution on replicas (but needing RBR).
Come to this talk to get all the details about MySQL 5.7 and 8.0 Parallel Replication.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark
Dynamic Partition Pruning in Apache SparkDatabricks
In data analytics frameworks such as Spark it is important to detect and avoid scanning data that is irrelevant to the executed query, an optimization which is known as partition pruning. Dynamic partition pruning occurs when the optimizer is unable to identify at parse time the partitions it has to eliminate. In particular, we consider a star schema which consists of one or multiple fact tables referencing any number of dimension tables. In such join operations, we can prune the partitions the join reads from a fact table by identifying those partitions that result from filtering the dimension tables. In this talk we present a mechanism for performing dynamic partition pruning at runtime by reusing the dimension table broadcast results in hash joins and we show significant improvements for most TPCDS queries.
1. MySQL uses Multi-Version Concurrency Control (MVCC) to allow for concurrent reads and writes to the same data.
2. MVCC works by storing the data in multiple versions and using read views and undo logs to present different versions of the data to transactions based on their isolation level and start time.
3. The InnoDB storage engine in MySQL implements MVCC using undo logs to reconstruct past versions of rows from writes and maintain multiple read views with different views of the data for concurrent transactions.
Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 1SolarWinds
The document provides an overview and agenda for a presentation on optimizing Oracle database performance through query tuning. It discusses identifying performance issues, collecting wait event information, reviewing execution plans, and understanding how the Oracle optimizer works using features like adaptive plans and statistics gathering. The goal is to show attendees how to quickly find and focus on the queries most in need of tuning.
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsJohn Beresniewicz
RMOUG 2020 abstract:
This session will cover core concepts for Oracle performance analysis first introduced in Oracle 10g and forming the backbone of many features in the Diagnostic and Tuning packs. The presentation will cover the theoretical basis and meaning of these concepts, as well as illustrate how they are fundamental to many user-facing features in both the database itself and Enterprise Manager.
A Deep Dive into Query Execution Engine of Spark SQLDatabricks
Spark SQL enables Spark to perform efficient and fault-tolerant relational query processing with analytics database technologies. The relational queries are compiled to the executable physical plans consisting of transformations and actions on RDDs with the generated Java code. The code is compiled to Java bytecode, executed at runtime by JVM and optimized by JIT to native machine code at runtime. This talk will take a deep dive into Spark SQL execution engine. The talk includes pipelined execution, whole-stage code generation, UDF execution, memory management, vectorized readers, lineage based RDD transformation and action.
Top 5 mistakes when writing Spark applicationshadooparchbook
This document discusses common mistakes people make when writing Spark applications and provides recommendations to address them. It covers issues related to executor configuration, application failures due to shuffle block sizes exceeding limits, slow jobs caused by data skew, and managing the DAG to avoid excessive shuffles and stages. Recommendations include using smaller executors, increasing the number of partitions, addressing skew through techniques like salting, and preferring ReduceByKey over GroupByKey and TreeReduce over Reduce to improve performance and resource usage.
The document discusses MySQL's buffer pool and buffer management. It describes how the buffer pool caches frequently accessed data in memory for faster access. The buffer pool contains several lists including a free list, LRU list, and flush list. It explains functions for reading pages from storage into the buffer pool, replacing pages using LRU, and flushing dirty pages to disk including single page flushes during buffer allocation.
How to Take Advantage of Optimizer Improvements in MySQL 8.0Norvald Ryeng
MySQL 8.0 introduces several improvements to the query optimizer that may give improved performance for your queries. This presentation looks at what kind of queries the different improvements apply to, and the focus is on what you can do to get the most out of the optimizer improvements. The main topics are changes to the optimizer cost model, histograms, and new optimizer hints, but other improvements to how MySQL executes queries are also covered. The presentation includes many practical examples of how you can get a significant speedup for your MySQL queries.
Evening out the uneven: dealing with skew in FlinkFlink Forward
Flink Forward San Francisco 2022.
When running Flink jobs, skew is a common problem that results in wasted resources and limited scalability. In the past years, we have helped our customers and users solve various skew-related issues in their Flink jobs or clusters. In this talk, we will present the different types of skew that users often run into: data skew, key skew, event time skew, state skew, and scheduling skew, and discuss solutions for each of them. We hope this will serve as a guideline to help you reduce skew in your Flink environment.
by
Jun Qin & Karl Friedrich
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Apache Iceberg - A Table Format for Hige Analytic Datasets
Speaker:
Ryan Blue, Netflix
For more Alluxio events: https://www.alluxio.io/events/
Join operations in Apache Spark is often the biggest source of performance problems and even full-blown exceptions in Spark. After this talk, you will understand the two most basic methods Spark employs for joining DataFrames – to the level of detail of how Spark distributes the data within the cluster. You’ll also find out how to work out common errors and even handle the trickiest corner cases we’ve encountered! After this talk, you should be able to write performance joins in Spark SQL that scale and are zippy fast!
This session will cover different ways of joining tables in Apache Spark.
Speaker: Vida Ha
This talk was originally presented at Spark Summit East 2017.
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
Spark SQL is a highly scalable and efficient relational processing engine with ease-to-use APIs and mid-query fault tolerance. It is a core module of Apache Spark. Spark SQL can process, integrate and analyze the data from diverse data sources (e.g., Hive, Cassandra, Kafka and Oracle) and file formats (e.g., Parquet, ORC, CSV, and JSON). This talk will dive into the technical details of SparkSQL spanning the entire lifecycle of a query execution. The audience will get a deeper understanding of Spark SQL and understand how to tune Spark SQL performance.
Making Apache Spark Better with Delta LakeDatabricks
Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, scalable metadata handling, and unifies the streaming and batch data processing. It runs on top of your existing data lake and is fully compatible with Apache Spark APIs.
In this talk, we will cover:
* What data quality problems Delta helps address
* How to convert your existing application to Delta Lake
* How the Delta Lake transaction protocol works internally
* The Delta Lake roadmap for the next few releases
* How to get involved!
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
This document provides an overview and summary of Amazon S3 best practices and tuning for Hadoop/Spark in the cloud. It discusses the relationship between Hadoop/Spark and S3, the differences between HDFS and S3 and their use cases, details on how S3 behaves from the perspective of Hadoop/Spark, well-known pitfalls and tunings related to S3 consistency and multipart uploads, and recent community activities related to S3. The presentation aims to help users optimize their use of S3 storage with Hadoop/Spark frameworks.
The document discusses MySQL Group Replication, which is a plugin that provides multi-master replication capability for MySQL. It allows data to be replicated between multiple MySQL servers so that they can stay in sync. The replication works by having each server send transaction writesets to other servers through a group communication system, and then each server certifies and applies the changes locally in an asynchronous manner.
15 Ways to Kill Your Mysql Application Performanceguest9912e5
Jay is the North American Community Relations Manager at MySQL. Author of Pro MySQL, Jay has also written articles for Linux Magazine and regularly assists software developers in identifying how to make the most effective use of MySQL. He has given sessions on performance tuning at the MySQL Users Conference, RedHat Summit, NY PHP Conference, OSCON and Ohio LinuxFest, among others.In his abundant free time, when not being pestered by his two needy cats and two noisy dogs, he daydreams in PHP code and ponders the ramifications of __clone().
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark
Dynamic Partition Pruning in Apache SparkDatabricks
In data analytics frameworks such as Spark it is important to detect and avoid scanning data that is irrelevant to the executed query, an optimization which is known as partition pruning. Dynamic partition pruning occurs when the optimizer is unable to identify at parse time the partitions it has to eliminate. In particular, we consider a star schema which consists of one or multiple fact tables referencing any number of dimension tables. In such join operations, we can prune the partitions the join reads from a fact table by identifying those partitions that result from filtering the dimension tables. In this talk we present a mechanism for performing dynamic partition pruning at runtime by reusing the dimension table broadcast results in hash joins and we show significant improvements for most TPCDS queries.
1. MySQL uses Multi-Version Concurrency Control (MVCC) to allow for concurrent reads and writes to the same data.
2. MVCC works by storing the data in multiple versions and using read views and undo logs to present different versions of the data to transactions based on their isolation level and start time.
3. The InnoDB storage engine in MySQL implements MVCC using undo logs to reconstruct past versions of rows from writes and maintain multiple read views with different views of the data for concurrent transactions.
Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 1SolarWinds
The document provides an overview and agenda for a presentation on optimizing Oracle database performance through query tuning. It discusses identifying performance issues, collecting wait event information, reviewing execution plans, and understanding how the Oracle optimizer works using features like adaptive plans and statistics gathering. The goal is to show attendees how to quickly find and focus on the queries most in need of tuning.
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsJohn Beresniewicz
RMOUG 2020 abstract:
This session will cover core concepts for Oracle performance analysis first introduced in Oracle 10g and forming the backbone of many features in the Diagnostic and Tuning packs. The presentation will cover the theoretical basis and meaning of these concepts, as well as illustrate how they are fundamental to many user-facing features in both the database itself and Enterprise Manager.
A Deep Dive into Query Execution Engine of Spark SQLDatabricks
Spark SQL enables Spark to perform efficient and fault-tolerant relational query processing with analytics database technologies. The relational queries are compiled to the executable physical plans consisting of transformations and actions on RDDs with the generated Java code. The code is compiled to Java bytecode, executed at runtime by JVM and optimized by JIT to native machine code at runtime. This talk will take a deep dive into Spark SQL execution engine. The talk includes pipelined execution, whole-stage code generation, UDF execution, memory management, vectorized readers, lineage based RDD transformation and action.
Top 5 mistakes when writing Spark applicationshadooparchbook
This document discusses common mistakes people make when writing Spark applications and provides recommendations to address them. It covers issues related to executor configuration, application failures due to shuffle block sizes exceeding limits, slow jobs caused by data skew, and managing the DAG to avoid excessive shuffles and stages. Recommendations include using smaller executors, increasing the number of partitions, addressing skew through techniques like salting, and preferring ReduceByKey over GroupByKey and TreeReduce over Reduce to improve performance and resource usage.
The document discusses MySQL's buffer pool and buffer management. It describes how the buffer pool caches frequently accessed data in memory for faster access. The buffer pool contains several lists including a free list, LRU list, and flush list. It explains functions for reading pages from storage into the buffer pool, replacing pages using LRU, and flushing dirty pages to disk including single page flushes during buffer allocation.
How to Take Advantage of Optimizer Improvements in MySQL 8.0Norvald Ryeng
MySQL 8.0 introduces several improvements to the query optimizer that may give improved performance for your queries. This presentation looks at what kind of queries the different improvements apply to, and the focus is on what you can do to get the most out of the optimizer improvements. The main topics are changes to the optimizer cost model, histograms, and new optimizer hints, but other improvements to how MySQL executes queries are also covered. The presentation includes many practical examples of how you can get a significant speedup for your MySQL queries.
Evening out the uneven: dealing with skew in FlinkFlink Forward
Flink Forward San Francisco 2022.
When running Flink jobs, skew is a common problem that results in wasted resources and limited scalability. In the past years, we have helped our customers and users solve various skew-related issues in their Flink jobs or clusters. In this talk, we will present the different types of skew that users often run into: data skew, key skew, event time skew, state skew, and scheduling skew, and discuss solutions for each of them. We hope this will serve as a guideline to help you reduce skew in your Flink environment.
by
Jun Qin & Karl Friedrich
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Apache Iceberg - A Table Format for Hige Analytic Datasets
Speaker:
Ryan Blue, Netflix
For more Alluxio events: https://www.alluxio.io/events/
Join operations in Apache Spark is often the biggest source of performance problems and even full-blown exceptions in Spark. After this talk, you will understand the two most basic methods Spark employs for joining DataFrames – to the level of detail of how Spark distributes the data within the cluster. You’ll also find out how to work out common errors and even handle the trickiest corner cases we’ve encountered! After this talk, you should be able to write performance joins in Spark SQL that scale and are zippy fast!
This session will cover different ways of joining tables in Apache Spark.
Speaker: Vida Ha
This talk was originally presented at Spark Summit East 2017.
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
Spark SQL is a highly scalable and efficient relational processing engine with ease-to-use APIs and mid-query fault tolerance. It is a core module of Apache Spark. Spark SQL can process, integrate and analyze the data from diverse data sources (e.g., Hive, Cassandra, Kafka and Oracle) and file formats (e.g., Parquet, ORC, CSV, and JSON). This talk will dive into the technical details of SparkSQL spanning the entire lifecycle of a query execution. The audience will get a deeper understanding of Spark SQL and understand how to tune Spark SQL performance.
Making Apache Spark Better with Delta LakeDatabricks
Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, scalable metadata handling, and unifies the streaming and batch data processing. It runs on top of your existing data lake and is fully compatible with Apache Spark APIs.
In this talk, we will cover:
* What data quality problems Delta helps address
* How to convert your existing application to Delta Lake
* How the Delta Lake transaction protocol works internally
* The Delta Lake roadmap for the next few releases
* How to get involved!
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
This document provides an overview and summary of Amazon S3 best practices and tuning for Hadoop/Spark in the cloud. It discusses the relationship between Hadoop/Spark and S3, the differences between HDFS and S3 and their use cases, details on how S3 behaves from the perspective of Hadoop/Spark, well-known pitfalls and tunings related to S3 consistency and multipart uploads, and recent community activities related to S3. The presentation aims to help users optimize their use of S3 storage with Hadoop/Spark frameworks.
The document discusses MySQL Group Replication, which is a plugin that provides multi-master replication capability for MySQL. It allows data to be replicated between multiple MySQL servers so that they can stay in sync. The replication works by having each server send transaction writesets to other servers through a group communication system, and then each server certifies and applies the changes locally in an asynchronous manner.
15 Ways to Kill Your Mysql Application Performanceguest9912e5
Jay is the North American Community Relations Manager at MySQL. Author of Pro MySQL, Jay has also written articles for Linux Magazine and regularly assists software developers in identifying how to make the most effective use of MySQL. He has given sessions on performance tuning at the MySQL Users Conference, RedHat Summit, NY PHP Conference, OSCON and Ohio LinuxFest, among others.In his abundant free time, when not being pestered by his two needy cats and two noisy dogs, he daydreams in PHP code and ponders the ramifications of __clone().
MongoDB Europe 2016 - Big Data meets Big ComputeMongoDB
- The document discusses how Spark can be used to connect MongoDB for analytics and processing large datasets. Spark is a fast, general engine for large-scale data processing.
- It provides an overview of Spark including its programming model using resilient distributed datasets (RDDs), built-in fault tolerance, and libraries for SQL, streaming, machine learning and graphs.
- The new MongoDB connector for Spark allows seamless integration between MongoDB and Spark. It supports DataFrames and Datasets with automatic schema inference and conversion. Proper configuration and partitioning strategies are important to optimize performance and data locality.
- A demo is presented using Spark and the connector to solve a "traveling salesman problem" of planning efficient travel routes between Europe
This document discusses optimizing data warehouse star schemas with MySQL. It provides tips for database design including using a star schema approach with dimension and fact tables. It recommends MySQL 5.x, optimizing the MySQL configuration file, designing tables and indexes effectively, using the correct data loading architecture, analyzing tables, writing efficient query styles, and examining explain plans. Optimizing these areas can help ensure fast query performance from large data warehouses built with MySQL.
The document provides tips for tuning DB2 database performance for non-DBAs. It discusses checking hardware configuration, optimizing memory allocation for sort heap, share sort and buffer pool based on database size, leveraging multiple CPUs, designing tables and indexes efficiently, and gathering statistics on commonly used columns. Specific steps are provided to set configuration parameters like DB2_PARALLEL_IO, SHEATHRES_SHR, SORTHEAP, and altering buffer pools and indexes. Overall, following these tips can help optimize a DB2 database for better query performance.
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019Dave Stokes
This document discusses 7 common database mistakes and how to avoid them. It begins by emphasizing the importance of proper backups and being able to restore data. It stresses having documentation and training others on restoration processes. The document also recommends keeping software updated for security reasons. It advises monitoring databases to understand performance and ensure uptime. Other mistakes covered include having inconsistent user permissions, not understanding indexing best practices, and not optimizing queries. The document concludes by promoting the benefits of using JSON columns in databases.
Challenges of Building a First Class SQL-on-Hadoop EngineNicolas Morales
Challenges of Building a First Class SQL-on-Hadoop Engine:
Why and what is Big SQL 3.0?
Overview of the challenges
How we solved (some of) them
Architecture and interaction with Hadoop
Query rewrite
Query optimization
Future challenges
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...Dave Stokes
Slow query? Add an index or two! But things are suddenly even slower! Indexes are great tools to speed data lookup but have overhead issues. Histograms don’t have that overhead but may not be suited. And how you lock rows also effects performance. So what do you do to speed up queries smartly?
MongoDB 3.2 - a giant leap. What’s new?Binary Studio
This document provides an overview of new features in MongoDB 3.2, including improvements to storage engines, transactions support, and new aggregation stages. Key points include WiredTiger becoming the default storage engine, its advantages over MMAPv1 like document-level concurrency and compression, and new aggregation stages like $sample and $lookup for performing left outer joins in the aggregation pipeline.
The document provides an overview of the InnoDB storage engine architecture in MySQL. It describes how InnoDB implements the ACID properties through atomic transactions that can either fully commit or roll back. It also explains the physical storage structure, including the system tablespace stored in the ibdata file, tablespace files, and redo log files. The document details the internal page, index, and transaction log structures used to store and access data on disk.
The document discusses strategies for optimizing InnoDB buffer pool usage in MySQL/MariaDB databases to decrease I/O load. It explains that concentrating frequently accessed data into fewer pages results in more efficient caching. Specific techniques recommended include increasing record density through smaller records, removing unnecessary indexes, and improving data locality through clustered primary keys on commonly queried fields. The use of random identifiers like GUIDs is discouraged as it reduces data locality.
Precog & MongoDB User Group: Skyrocket Your Analytics MongoDB
earn how to do advanced analytics with the Precog data science platform on your MongoDB database. It's free to download the Precog file and after installing, you'll be on your way to analyzing all the data in your MongoDB database, without forcing you to export data into another tool or write any custom code. Learn more here: www.precog.com/mongodb
Amazon Redshift is a fully managed data warehouse service that allows for petabyte-scale analytics on data stored in columns. It uses a massively parallel processing architecture and columnar data storage to improve query performance. Defining sort keys and distribution keys appropriately is crucial to influence how data is stored and queries are processed in parallel across nodes. Automatic features like concurrency scaling, resize operations, and backups help ensure the warehouse scales and remains available as data and usage grow over time.
MySQL configuration - The most important VariablesFromDual GmbH
This document discusses MySQL configuration parameters. It introduces the "Big 9" most important parameters to tune, which include the InnoDB buffer pool size and configuration, the MyISAM key buffer size, and the MySQL query cache size. It provides examples of status variables and metrics to monitor for these components, such as hit ratios and memory utilization. The goal is to help identify performance optimization opportunities through adjusting key configuration settings and analyzing monitoring data.
Find out which is faster, SQL or NoSQL, for traditional reporting tasks. Discover how you can optimise MongoDB aggregation pipelines and how to push complex computation down to the database.
Challenges of Implementing an Advanced SQL Engine on HadoopDataWorks Summit
Big SQL 3.0 is IBM's SQL engine for Hadoop that addresses challenges of building a first class SQL engine on Hadoop. It uses a modern MPP shared-nothing architecture and is architected from the ground up for low latency and high throughput. Key challenges included data placement on Hadoop, reading and writing Hadoop file formats, query optimization with limited statistics, and resource management with a shared Hadoop cluster. The architecture utilizes existing SQL query rewrite and optimization capabilities while introducing new capabilities for statistics, constraints, and pushdown to Hadoop file formats and data sources.
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Citus Data
As a developer using PostgreSQL one of the most important tasks you have to deal with is modeling the database schema for your application. In order to achieve a solid design, it’s important to understand how the schema is then going to be used as well as the trade-offs it involves.
As Fred Brooks said: “Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.”
In this talk we're going to see practical normalisation examples and their benefits, and also review some anti-patterns and their typical PostgreSQL solutions, including Denormalization techniques thanks to advanced Data Types.
The document discusses B-tree indexes in PostgreSQL. It provides an overview of B-tree index internals including page layout, the meta page, Lehman & Yao algorithm adaptations, and new features like covering indexes, partial indexes, and HOT updates. It also outlines development challenges and future work needed like index compression, index-organized tables, and global partitioned indexes. The presenter aims to inspect B-tree index internals, present new features, clarify the development roadmap, and understand difficulties.
Similar to MySQL innoDB split and merge pages (20)
Percona xtra db cluster(pxc) non blocking operations, what you need to know t...Marco Tusa
Performing simple DDL operations as ADD/DROP INDEX in a tightly connected cluster as PXC, can become a nightmare. Metalock will prevent Data modifications for long period of time and to bypass this, we need to become creative, like using Rolling schema upgrade or Percona online-schema-change. With NBO, we will be able to avoid such craziness at least for a simple operation like adding an index. In this brief talk I will illustrate what you should do to see the negative effect of NON using NBO, as well what you should do to use it correctly and what to expect out of it.
The constant pressure to move DATA in containers and Kubernetes is creating a lot of confusion and misunderstanding.
This is particularly dangerous when talking about Relational Database Management System.
MySQL, as well as Oracle, Postgres or SQL Server, is a RDBM, as such subject to the erroneous interpretation caused by this new crazy shining things that will solve all. In this short talk we will clarify, that first of all, we are not looking to something new and second why we need to be very careful when talking about using Kubernetes and containers for RDBMS.
Comparing high availability solutions with percona xtradb cluster and percona...Marco Tusa
Percona XtraDB Cluster (PXC) is currently the most popular solution for HA in the MySQL ecosystem, and any solutions Galera-based as PXC have been the only viable option when looking for a high grade of HA using synchronous replication.
But Oracle had intensively worked on making Group Replication more solid and easy to use.
It is time to identify if Group Replication and attached solutions, like InnoDB cluster, can compete or even replace solutions based on Galera.
This presentation will focus on comparing the two solutions and how they behave when serving basic HA problems.
Attendees will be able to get a clearer understanding of which solutions will serve them better, and in which cases.
Accessing data through hibernate: what DBAs should tell to developers and vic...Marco Tusa
Accessing data through Hibernate, what DBA should tell to developers by Marco Tusa & Francisco Bordenave
This presentation will go through the simple process of accessing data from a Java application. What actually happens when we use a simple direct connection, and what instead happen using an ORM/Persistent layer like hibernate. How this apparently makes programmers life easier and DBAs days more difficult.
Best practice-high availability-solution-geo-distributed-finalMarco Tusa
Nowadays implementing different grades of business continuity for the data layer storage is a common requirement. When designing architectures that include MySQL as a data layer, we have different options to cover the required target. Nevertheless we still see a lot of confusion when in the need to properly cover concepts such as High Availability and Disaster Recovery. Confusion that often leads to improper architecture design and wrong solution implementation. This presentation aims to remove that confusion and provide clear guidelines when in the need to design a robust, flexible resilient architecture for your data layer.
ProxySQL provides native support for high availability solutions like PXC, InnoDB Cluster, and regular MySQL replication. It can monitor the health of nodes and redirect traffic away from unavailable or stale nodes, improving availability. It supports various topologies out of the box through host groups, health checks, and failure detection. ProxySQL helps implement robust HA architectures by integrating these functions and allowing automatic traffic redirection based on node status.
Secure our data is a complex topic. We can build a very strong protection around our data, but nothing will prevent the one WHO could potentially access it to compromise the data integrity or to expose it.
This because we either under estimate the control we can or should impose, or because we think to do not have the tools to perform such control.
Nowadays to be able to control and manage what can access our data is a must, while how to do with standard tools it is a nightmare.
The presentation will guide you in a journey, there you will discover how implementing a quite robust protection, more than what you thought was possible.
Even more, it is possible and your performances will even improve. Cool right?
We will discuss:
- Access using not standard port
- Implement selective query access
- Define accessibility by location/ip/id
- Reduce to minimum cost of filtering
- Automate the query discovery
Accessing Data Through Hibernate; What DBAs Should Tell Developers and Vice V...Marco Tusa
Hibernate is an object-relational mapping tool that allows developers to access and manage relational database data using object-oriented programming. It maps database tables to Java classes and rows to objects. The document discusses how Hibernate works, how it interacts with databases like MySQL, and what developers and DBAs need to know to effectively use Hibernate for data access and management while optimizing database performance.
Are we there Yet?? (The long journey of Migrating from close source to opens...Marco Tusa
Migrating from Oracle to MySQL or another Open source RDBMS like Postgres is not as straightforward as many think if not well guided. Check what it means doing with someone that has done it already.
Marco Tusa gave a presentation on using ProxySQL to improve performance with Amazon Aurora databases. He explained what ProxySQL and AWS Aurora are and how ProxySQL can be configured to work with Aurora to distribute read queries across replicas for load balancing. Metrics were shown that read latency is lower with ProxySQL compared to directly connecting to Aurora. The native Aurora connection endpoints are limited while ProxySQL allows more granular control over query routing and additional features like caching.
ProxySQL can be used with Amazon Aurora or other AWS solutions to improve security and performance. It can filter and redirect queries, caching results and load balancing connections across backend servers. Using ProxySQL's query rules, administrators can limit actions on the database by blocking or rewriting queries based on attributes of the client, query, or destination host group. This provides finer-grained control over queries than native AWS options alone. While ProxySQL has some performance overhead, using techniques like digest rules instead of regex matching can reduce this cost significantly.
Mysql8 advance tuning with resource groupMarco Tusa
I have a very noisy secondary application written by a very, very bad developer that accesses my servers, mostly with read queries, and occasionally with write updates. Reads and writes are obsessive and create an impact on the MAIN application. My task is to limit the impact of this secondary application without having the main one affected. To do that I will create two resource groups, one for WRITE and another for READ. The first group, Write_app2, will have no cpu affiliation, but will have lowest priority.
Advance Sharding Solution with ProxySQL
ProxySQL is a very powerful platform that allows us to manipulate and manage our connections and queries in a simple but effective way.
Historically MySQL lacks in sharding capability. This significant missing part had often cause developer do implement sharding at application level, or DBA/SA to move on to another solution.
ProxySQL comes with an elegant and simple solution that allow us to implement sharding capability with MySQL without the need to perform significant, or at all, changes in the code.
This brief presentation will illustrate how to successfully configure and use ProxySQL to perform sharding, from very simple approach based on connection user/ip/port, to complicate ones that see the need to read values inside queries.
Geographically dispersed perconaxtra db cluster deploymentMarco Tusa
Geographically Dispersed Percona XtraDB Cluster Deployment
Percona XtraDB Cluster is a very robust, high performing and widly used solution to answer to High Availability needs. But it can be very challinging when we are in the need to deploy the cluster over a geographically disperse area.
This presentation will briefely discuss what is the right approach to sucessfully deploy PXC when in the need to cover multiple geographical sites, close and far.
- What is PXC and what happens in a set of node when commit
- Let us clarify, geo dispersed
- What to keep in mind then
- how to measure it correctly
- Use the right way (sync/async)
- Use help like replication_manager
After some years, MySQL with Galera became the most common solution for synchronous replication. The cloud (and EC2 in particular) was one of the platforms that most successfully employed MySQL/Galera installations.
This year with Aurora, Amazon introduced an alternative solution that use all the flexibility of AWS and simplicity of RDS.
This presentation describes the behavior of both MySQL/Galera and Aurora, showing the details of how the two different solutions behave when dealing with same load. We will highlight the strong point of each, and which represents the best tool - depending on the needs of the situation.
Attendees will be able to make an informed decision on what kind of solutions will be the most efficient, in respect to their actual requirements.
Presentation shows how ProxySQL can improve the HA in solution like MySQL async and sync replication without the need to increase the platform complexity.
Empower my sql server administration with 5.7 instrumentsMarco Tusa
The document provides an overview of the MySQL Performance Schema. It defines the Performance Schema as a way to introspect the internal execution of the MySQL server at runtime. It collects performance data and focuses on performance rather than metadata. It consists of a dedicated database schema and SQL tables used to query the server state or configure settings. The Performance Schema exposes different interfaces for components, purposes like instrumentation, configuration, and queries. It aims to measure execution with low performance impact and without altering server operation.
The document summarizes a presentation given by Marco Tusa and Aleksandr Kuzminsky on using the Percona Data Recovery Tool for InnoDB to recover data from corrupted or missing InnoDB files. It introduces the speakers and outlines how the tool can be used to extract data from ibdata files, attach .ibd files after corruption, recover deleted records, recover dropped tables, and recover from wrong updates by parsing binary logs. Examples are provided of running the tool's page_parser, constraints_parser and sys_parser components to extract schema and data.
Scaling with sync_replication using Galera and EC2Marco Tusa
Challenging architecture design, and proof of concept on a real case of study using Syncrhomous solution.
Customer asks me to investigate and design MySQL architecture to support his application serving shops around the globe.
Scale out and scale in base to sales seasons.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
2. 3
• Open source enthusiast
• MySQL tech lead; principal architect
• Working in DB world over 33 years
• Open source developer and community contributor
About Me
3. WHY WE SHOULD CARE?
• MySQL/InnoDB constantly performs SPLIT and MERGE operations
• We have very limited visibility of them.
• The sad story is there is also very little we can do to optimize this on the server side
using parameters or some other magic.
• But the good news is there is A LOT that can be done at design time.
• Use a proper Primary Key and design a secondary index, keeping in mind that
you shouldn’t abuse of them.
• Plan proper maintenance windows on the tables that you know will have very
high levels of inserts/deletes/updates.
5. THE MATRYOSHKA DOLL
[split]>select id, uuid,processid,points, date, active,time, substring(short,1,4) short,substring(longc,1,4) longc from tbinc
limit 5 ;
+----+--------------------------------------+-----------+--------+------------+--------+---------------------+-------+-------+
| id | uuid | processid | points | date | active | time | short | longc |
+----+--------------------------------------+-----------+--------+------------+--------+---------------------+-------+-------+
| 1 | 1c0bc1cc-ef69-11ea-ad30-9eca1c2a7efa | 127 | 2707 | 2020-04-01 | 1 | 2020-09-05 13:15:30 | That | Whic |
| 2 | 1c0bc1b8-ef69-11ea-ad30-9eca1c2a7efa | 136 | 1810 | 2020-04-01 | 1 | 2020-09-05 13:15:30 | That | All |
| 3 | 1c0bc172-ef69-11ea-ad30-9eca1c2a7efa | 163 | 4752 | 2020-04-01 | 0 | 2020-09-05 13:15:30 | That | When |
| 4 | 1c0bc17c-ef69-11ea-ad30-9eca1c2a7efa | 71 | 3746 | 2020-04-01 | 1 | 2020-09-05 13:15:30 | That | 87F |
| 5 | 1c5b0002-ef69-11ea-ad30-9eca1c2a7efa | 14 | 2339 | 2020-04-01 | 0 | 2020-09-05 13:15:31 | That | adve |
+----+--------------------------------------+-----------+--------+------------+--------+---------------------+-------+-------+
6. THE MATRYOSHKA DOLL
A well-known statement
“Inside InnoDB all is an INDEX”
What it means?
In the examples we will use:
Schema: split
Tables:
tbinc
tbpcid
tbUUID
data/split/
|-- db.opt
|-- tbinc.frm
|-- tbinc.ibd
|-- tbpcid.frm
|-- tbpcid.ibd
|-- tbUUID.frm
`-- tbUUID.ibd
7. THE MATRYOSHKA DOLL
➤ data is in one segment and each associated
index is in its own segment. Segments grow
and shrink as data is inserted and deleted.
When a segment needs more room, it is
extended by one extent (1 megabyte) at a time.
➤ A group of pages within a tablespace. Extent
size is always 1MB.
➤ A page can contain one or more rows,
depending on how much data is in each row. If
a row does not fit entirely into a single page,
InnoDB sets up additional pointer-style data
structures so that the information about the
row can be stored in one page.
➤ Maximum row size for the default
innodb_page_size of 16KB is about 8000 bytes
8. INNODB B-TREE
➤ InnoDB uses B-trees to organize your data inside pages across extents, within segments.
➤ Roots, Branches, and Leaves
➤ Each page (leaf) contains 2-N rows(s) organized by the primary key. The tree has special
pages to manage the different branch(es). These are known as internal nodes (INodes).
9. LEAVES AS LINKED LIST
➤ Pages are also referenced as linked list
➤ Linked list follow the logical order of the index (or Primary Key)
➤ Linked list order may not follow the phisical order of the pages in the extents
Next Page #7
Page #6
Next Page #8
Page #7
Previous
Page #6
Next
Page #120
Page #8
Previous
Page #7
Next
Page #12
Page
#120
Previous
Page #8
Page
#N
10. A RECORD IN A PAGE - TBINC
ROOT NODE #3: 169 records, 2873 bytes
NODE POINTER RECORD ≥ (id=2) → #6
LEAF NODE #6: 12 records, 7560 bytes
RECORD: (id=2) →
(uuid="a993ee0c-e7a4-11ea-bde9-08002734ed50",
processid=14,
points=3531,
date="2020-04-01",
short="Art.. have",
longc="France.. To-mo",
active=1,
time="2020-08-26 14:01:39")
RECORD: (id=5) → …
NODE POINTER RECORD ≥ (id=38) → #7
LEAF NODE #7: 24 records, 15120 bytes
RECORD: (id=38) →
CREATE TABLE `tbinc` (
`id` bigint(11) NOT NULL AUTO_INCREMENT,
`uuid` char(36) NOT NULL,
`processid` smallint(6) NOT NULL,
`points` int(11) NOT NULL,
`date` date NOT NULL,
`short` varchar(50) NOT NULL,
`longc` varchar(500)NOT NULL,
`active` tinyint(2) NOT NULL DEFAULT '1',
`time` timestamp NOT NULL DEFAULT
CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `IDX_processid` (`processid`,`active`),
KEY `IDX_active` (`id`,`active`)
) ENGINE=InnoDB
11. A RECORD IN A PAGE - TBPCID
ROOT NODE #3: 270 records, 5130 bytes
NODE POINTER RECORD ≥ (processid=7, id=17) → #6
LEAF NODE #6: 24 records, 14901 bytes
RECORD: (processid=0, id=137) →
(uuid="ad1353d3-e7a4-11ea-bde9-08002734ed50",
points=3729,
date="2020-04-01",
short="Nor ... the",
longc="From ...",
active=1,
time="2020-08-26 14:01:45")
NODE POINTER RECORD ≥ (processid=1, id=3827) →
#209
LEAF NODE #209: 15 records, 9450 bytes
RECORD: (processid=1, id=3827) →
Create Table: CREATE TABLE `tbpcid` (
`id` bigint(11) NOT NULL AUTO_INCREMENT,
`uuid` char(36) NOT NULL,
`processid` smallint(6) NOT NULL,
`points` int(11) NOT NULL,
`date` date NOT NULL,
`short` varchar(50) NOT NULL,
`longc` varchar(500)NOT NULL,
`active` tinyint(2) NOT NULL DEFAULT '1',
`time` timestamp NOT NULL DEFAULT
CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`processid`,`id`),
KEY `IDX_id` (`id`),
KEY `IDX_mid` (`processid`,`active`)
) ENGINE=InnoDB
12. A RECORD IN A PAGE - TBUUID
ROOT NODE #3: 235 records, 11280 bytes
NODE POINTER RECORD ≥ (uuid="05de43720080-9edb-
ae11-268e-0d216866", processid=12) → #6
LEAF NODE #6: 24 records, 15120 bytes
RECORD:
(uuid="05de43720080-9edb-ae11-268e-00252077",
processid=73) →
(id=632,
points=3740,
date="2020-01-10",
short="So ... p",
longc="The s.. w",
active=0,
time="2020-08-27 12:40:18")
NODE POINTER RECORD ≥ (uuid="05de43720080-9edb-
ae11-268e-03b393c6", processid=192) → #136
LEAF NODE #136: 4 records, 2520 bytes
RECORD: (uuid="05de43720080-9edb-ae11-268e-
03b393c6", processid=192) →
Create Table: CREATE TABLE `tbUUID` (
`id` bigint(11) NOT NULL AUTO_INCREMENT,
`uuid` char(36) NOT NULL,
`processid` smallint(6) NOT NULL,
`points` int(11) NOT NULL,
`date` date NOT NULL,
`short` varchar(50) NOT NULL,
`longc` varchar(500) NOT NULL,
`active` tinyint(2) NOT NULL DEFAULT '1',
`time` timestamp NOT NULL DEFAULT
CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`uuid`,`processid`),
UNIQUE KEY `IDX_id` (`id`),
KEY `IDX_mid` (`processid`,`active`)
) ENGINE=InnoDB
13. SEGMENTS, EXTENTS
AND PAGES
Same number of rows.
Different physical distribution:
PK tbinc: id autoinc
PK tbpcid: process id, id
PK tbUUID: reverse UUID,
process id
14. SEGMENTS, EXTENTS
AND PAGES
Same number of rows.
Different physical distribution:
PK tbinc: id autoinc
PK tbpcid: process id, id
PK tbUUID: reverse UUID,
process id
15. REMINDER
The concept here is that while you organize your data in
tables and rows, InnoDB organizes it in branches, pages,
and records.
It is very important to keep in mind that InnoDB does not
work on a single row basis.
InnoDB always operates on pages.
Once a page is loaded, it will then scan the page for the
requested row/record.
16. PAGE INTERNALS
A page can be empty or fully filled (100%).
The row-records will be organized by PK. For example, if your table is using
an AUTO_INCREMENT, you will have the sequence ID = 1, 2, 3, 4, etc.
A page also has another important attribute: MERGE_THRESHOLD. The default value of
this parameter is 50% of the page, and it plays a very important role in InnoDB merge
activity.
17. PAGE INTERNALS
While you insert data, the page is filled up sequentially if the incoming record can be
accommodated inside the page.
18. PAGE INTERNALS
When a page is full, the next record will be inserted into the NEXT page (SPLIT counter
incremented):
Browse is not only top down but also by linked list, page #6 refer to #5 and #7
But what happens if I start to delete values?
19. PAGE MERGES
When you delete a record, the record is not physically deleted. Instead, it flags the
record as deleted and the space it used becomes reclaimable.
When a page has received enough deletes to match the MERGE_THRESHOLD (50% of the
page size by default), InnoDB starts to look to the closest pages (NEXT and PREVIOUS) to
see if there is any chance to optimize the space utilization by merging the two pages.
20. PAGE INTERNALS
In this example, Page #6 is utilizing less than half of its space. Page #5 received many
deletes and is also now less than 50% used. From InnoDB’s perspective, they are
mergeable:
The merge operation results in Page #5 containing its previous data plus the data from Page
#6
Page #6 will be empty after the merge
21. THE TAKEOUT
The rule is: Merges happen on delete and update operations involving close
linked pages.
If a merge operation is successful, the index_page_merge_successful metric in
INFORMATION_SCHEMA.INNODB_METRICS is incremented.
select name,count from INFORMATION_SCHEMA.INNODB_METRICS where name like 'index_page%';
+-----------------------------+--------+
| name | count |
+-----------------------------+--------+
| index_page_splits | 25391 |
| index_page_merge_attempts | 135259 |
| index_page_merge_successful | 15534 |
| index_page_reorg_attempts | 83949 |
| index_page_reorg_successful | 83949 |
| index_page_discards | 29 |
+-----------------------------+--------+
22. PAGE SPLIT
As mentioned above, a
page can be filled up to
100%. When this happens,
the next page takes new
records.
23. PAGE SPLIT
When next Page is full
split will happen
Records from page #10 will be moved to the split page #12 up to the merge threshold.
24. PAGE SPLIT
What InnoDB will do is (simplifying):
➤ Create a new page
➤ Identify where the original page (Page #10) can be split (at the record level)
➤ Move records
➤ Redefine the page relationships
Page #11 stays as it is.
The thing that changes is the relationship
between the pages:
• Page #10 will have Prev=9 and Next=12
• Page #12 Prev=10 and Next=11
• Page #11 Prev=12 and Next=13
25. THE TAKEOUT
The rule is: Page splits happens on Insert or Update, and cause page
dislocation (in many cases on different extents).
If a merge operation is successful, the index_page_splits,
index_page_reorg_attempts / successful metrics in
INFORMATION_SCHEMA.INNODB_METRICS is incremented.
select name,count from INFORMATION_SCHEMA.INNODB_METRICS where name like 'index_page%';
+-----------------------------+--------+
| name | count |
+-----------------------------+--------+
| index_page_splits | 25391 |
| index_page_merge_attempts | 135259 |
| index_page_merge_successful | 15534 |
| index_page_reorg_attempts | 83949 |
| index_page_reorg_successful | 83949 |
| index_page_discards | 29 |
+-----------------------------+--------+
30. SOME GRAPHS AND NUMBERS – SPLITS
The first case will have more
“compact” data distribution.
This means it will also have
better space utilization,
while Processid with
autoinc, and the semi-
random nature of the UUID
will cause a significant
“sparse” page distribution
(causing a higher number of
pages and related split
operations).
Autoinc processid-AutoInc Reverse UUID-processid
Splits 2472 7881 6569
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
Splits
31. SOME GRAPHS AND NUMBERS – MERGES
auto-increment has less page
merge attempts and success
ratio than the other two
types. The PK with
Processis-autoinc (on the
side other of the spectrum)
has a higher number of
merge attempts, but at the
same time also a
significantly higher success
ratio at 16,42%, given that
the “sparse” distribution left
many pages partially empty.
3697
32974 34521
295
5414 3454
7,98
16,42
10,01
0,00
2,00
4,00
6,00
8,00
10,00
12,00
14,00
16,00
18,00
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
Ins-Del-Up Ins-Del-Up Ins-Del-Up
AutoInc processid-AutoInc Reverse UUID-processid
Merges
index_page_merge_attempts index_page_merge_successful Merge_success_ratio
32. SO WHAT?
During merge and split operations, InnoDB acquires an x-latch to the
index tree.
On a busy system, this can easily become a source of concern.
This can cause index latch contention.
If no merges and splits (aka writes) touch only a single page, this is called
an “optimistic” update in InnoDB, and the latch is only taken in S.
Merges and splits are called “pessimistic” updates, and take the latch in X.
33. CONCLUSIONS
➤ MySQL/InnoDB constantly performs these operations, and you have very limited visibility of them.
But they can bite you, and bite hard, especially if using a spindle storage VS SSD (which have
different issues, by the way).
➤ The sad story is there is also very little we can do to optimize this on the server side using
parameters or some other magic. But the good news is there is A LOT that can be done at design
time.
➤ Use a proper Primary Key and design a secondary index, keeping in mind that you shouldn’t abuse
of them. Plan proper maintenance windows on the tables that you know will have very high levels of
inserts/deletes/updates.
➤ This is an important point to keep in mind. In InnoDB you cannot have fragmented records, but you
can have a nightmare at the page-extent level. Ignoring table maintenance will cause more work at
the IO level, memory and InnoDB buffer pool.
➤ You must rebuild some tables at regular intervals. Use whatever tricks it requires, including
partitioning and external tools (pt-osc). Do not let a table to become gigantic and fully fragmented.
➤ Wasting disk space? Need to load three pages instead one to retrieve the record set you need? Each
search causes significantly more reads?
That’s your fault; there is no excuse for being sloppy!