I would like to discuss today about the main points a DBA has to consider when addressing the most important topic about service availability: performance. In partcular this will be a walkthrough the most important aspcts and configuration parameters to check, what should be analyzed when our Server is «slow», and understand that we may be stressing our system more than needed, so the perception that better hardware implies better performance, is not always true.
You may have surely heard that quote saying «Premature optimization is the root of all evil». And this is actually true, as in the moment we deploy our service, we can’t know what will happen in the future, what levels of concurrency will we observe, what will be the first bottlenecks to appear. Nevertheless it’s good to know what parameters we can tune from the very beginning so to forecast future impacts and prevent them from degrading our service quality.
Let’s have a quick look at the content for this session, after introducing you to the tools we’ll use to perform the Hands On Lab, we’ll show some working examples of configurations to scale connections. This topic will be tackled by addressing connections threshold and threading model. After that, I’ll show how I/O is impacted by the two main actors in a MySQL Server: REDO logging and InnoDB buffer pool. Finally, I’ll show how to improve execution plan for a query with a concrete example. Let’s go!
In order to perform these simple exercises over Linux OS, I have installed sysstat, available from repository. I have also provisioned MySQL Server with employees DB, which is a reference open source DB useful for a wide set of benchmarks.
Here’s all the details for Employees DB. Besides that, I will be using mysqlslap util, which is an emulation client to simulate concurrency and to benchmark a server. Following exercises and topics will be based on InnoDB. Server configuration is the default, so I did not edit my.cnf configuration file. I also recommend to restart the server before trying these exercises, so to reset all counters and statistics.
Let’s have a quick look at the connection model in a MySQL Server. In the default configuration scenario, to one connection corresponds a thread. Thread is in charge of authentication and after that, performs all the needed tasks to execute the query and return results to the client. It is easy to understand where the bottleneck may be if you grow quick with number of connections, as every thread needs resources to execute.
Remember to make configuration changes persistent to /etc/my.cnf
We have seen how to increase the maximum number of connections allowed, MySQL Server won’t reject connections if existing concurrent connections will stay within the limits configured. But this is not a good practice, as creating an arbitrary number of connections will provoke rejection sooner or later. A client side solution for intelligent scalability, is using connection pooling. Connection pooling is a technique of creating and managing a pool of connections that are ready for use by any thread that needs them. Connection pooling can increase performance of your application, as pulling an existing connection from the pool will reduce overheads to create and dispose it and will allow for more concurrent clients to access the database.
Now that we know that a connection needs a thread created on the server to manage authentication and process all the tasks related to query execution, let’s dig a little bit to understand how to improve thread management. One asset to improve concurrency and reduce overhead when a thread is created and disposed, is the thread cache. Manager threads create a new thread when necessary but try to avoid doing so by consulting the thread cache. But… how many threads can fit into the thread cache?
Thread cache is configurable, and this can be done by setting thread_cache_size. There is also a set of parameters to monitor how thread cache is doing and understand if it is big enough. From previous execution, we see that almost 300 0 thousands threads have been created, this seems high, with the little uptime and number of operations I executed after restart.
Let’s observe how threads are created when mysqlslap utility is launched again. Now Threads_created has increased by almost 1000!
Let’s fix this situation and let’s choose a bigger size for our thread cache. With high concurrency, I expect all the threads in the cache will be used, therefore to higher concurrency, I should choose a bigger cache size. After setting a bigger cache size, let’s run again our test. This time I notice just a little increase of around 100 threads. These are the new threads created to be stored into the cache size and will be kept for future operations. Purpose of setting the size of thread cache size, is avoiding new threads are created. So as a rule of thumb, rememember to monitor threads_created when choosing the right size for thread cache.
So far we have discovered how to increment connections limit and how to cache threads. These tunings can help expand our server capabilities and scale connections beyond what originally planned. Nevertheless, this is not efficient and a bottleneck will be hit after certain number of threads are executing. Solution to scalability passes through using connection pools client side, as mentioned, and adopting an improved threads model, server side. MySQL Enterprise Thread Pool addresses the typical pattern «one thread per connection» by allowing an improved threads model, reducing context switching and contention.
Let’s talk now about the REDO log, a disk based data structure used during during crash recovery. In practical terms, it’s a set of files, 2 by default, written in circular fashion before a transaction is committed. This represents the «D» in ACID, that is durability. Every change done to the database, must be logged so the transaction can be recovered in case the database crash and changes have not been applied to the tablespace.
Let’s take a look at REDO logging most important parameters. When talking about REDO, main concern should have the right size so to incur into checkpointing issue. To a too small REDO log, corresponds a high rotation rate and a forced buffer pool checkpoint: we cannot permit transactions logged in the REDO are overwritten when REDO rotates, so InnoDB forces a buffer pool checkpoint: during this phase, all pages not yet synchronized to disk, are flushed. This forced flush can represent a bottleneck, as transactions are stopped until flush completes. Recommendation is to increase REDO log file size accordingly to avoid this mechanism be triggered. Apart from this impact, I would like to show the effect REDO logging has on I/O. When commit-related I/O operations are rearranged and done in batches against REDO log, performance improves. innodb_flush_log_at_trx_commit allows controlling this behavior. When it is set to 1, at every transactions commit corresponds a REDO log flush. By setting to 0, transactions are written and flushed every second. This will reduce overhead dramatically, as will save overhead by doing a flush for every transaction commit. Let’s see the different impacts this tuning has in terms of I/O pressure.
To show effects on I/O, let’s have two console sessions at hand, on the first, we will execute mysqlslap as usual, and on the second, let’s use iostat to monitor I/O usage. In particular, parameters chosen will set a report every 2 seconds on the storage device where REDO logs are stored.
Here’s the output, notice the number of write requests and bandwith utilization of the device in the last column.
Now let’s use a mysql client session to set innodb_flush_log_at_trx_commit=0. Please keep in mind that this exercise helps understand how REDO logging keeps I/O busy, but flushing transactions once per second may provoke up to one second of transactions loss in case of crash, so be careful about using it, typical useful scenarios may be replication slaves or test instances. Let’s run again in the first console mysqlslap. Let’s observe again output for iostat in the second console.
You can notice how the bandwith, therefore I/O usage has dramatically decreased. When you are binlogging, REDO logging, UNDO logging and checkpointing buffer pool to disk, it is important to monitor I/O so it does not become our bottleneck.
Just a couple of further details: you can monitor I/O by checking related status variable and also using sys schema, not only for REDO, but for all files written by InnoDB.
We have seen how important it is to keep I/O usage under control. InnoDB buffer pool is the key layer to increase performance by reducing accesses to disk whenever data is requested by a statement. This storage area is in memory and is a cache to ensure fast access to data and avoid innecessary I/O reads.
The bigger buffer pool, the better. So as a rule of thumb, buffer pool should be as big as possible to hold all data frequently accessed in memory. There are different rules around, like making buffer pool use 80% of your system memory, but in reality, better option to size buffer pool is monitoring how buffer pool is effectively used. To the purpose, there are different parameters, being statistics from information_schema the most important, we will refer to them in the following example.
Let’s run once again our mysqlslap benchmark tool in a console session and consult in another session what is reported by information_schema. I can tell mysqlskap to stress my data base instance by using a customer query, so this time I chose a bad query, SELECT * FROM employees is doing a scan and requesting all table content, therefore it will read as much data as possible from the buffer pool and return it to the client.
Let’s focus at HIT_RATE: as you see, value of 1000 means that all data is successfully retrieved from buffer pool. So far, so good. Our buffer pool with default configuration and this traffic model is good enough. I/O is not accessed to fetch records from pages.
To show the role of buffer pool, let’s now shrink it to the minimum size, by acting on innodb_buffer_pool_chunk_size and innodb_buffer_pool_size I can resize it at will. After restarting the server (note that buffer pool can be changed online, without any restart, but when chunk size is reconfigured, restart is needed). Let’s run again our benchmarking query and check information_schema
Here oyu can see pool_size is smaller, and hit_rate is not 1000 anymore. To the smaller buffer pool, more I/O accesses are needed.
So the typical question we hear at Support is «My query is slow, why?» This problem may be addressed by redesigning a query and by using the right set of indexes so suggest MySQL optimizer the fast execution path to retrieve resulting records. In this case, using the right index for a query corresponds to reduced I/O accesses and better performance. Execution plan is key to understand what is the perfromance for a query. Let’s see it with an example. I have chosen a slighly different than the typical query, based on a function of the column in the where filter clause. Indeed the query may be rewritten for better efficiency, but let’s keep it for educational purposes. By checking the explain plan, a full table scan is performed, almost 300.000 thousands rows are read. Optimizer need an index to filter only those records we need. So why not adding an index on hire_date? We can certainly do that, but it won’t improve our execution plan. What’s happening? Let’s see in the next slide.
When filter is on a function of the column, optimizer can’t kick in so we need to use an index on a generated column. This translates into: «let’s precalculate the function and add an index on it». After this change, optimizer will finally be able to get only the rows we need. Look at the difference, 300,000 rows versus 111!
We have gone through some basic tuning to address any immediate performance problem, to wrap up: