XtraDB 5.7: Key
Performance Algorithms
Laurynas Biveinis
Alexey Stroganov
Percona
firstname.lastname@percona.com
XtraDB 5.7 Key Performance
Algorithms
• Focus on the buffer pool, flushing, the doublewrite
buffer
• Talk assumes familiarity, but feel free to interrupt
• What we learned
• What we did
• How we did it
InnoDB buffer pool
• Memory cache of disk data pages
• In-memory data pages accessible through several
data structures
• 1) Page hash for lookup
(space_id; page_id) ?
Hash array Data page lists
Fold
InnoDB buffer pool
• 2) flush list for dirty page management. Dirtying:
Clean page
INSERT INTO foo VALUES(bar)
Dirty page, LSN = 42
Dirty page, LSN = 25 Dirty page, LSN = 32 Dirty page, LSN = 42
Flush list tail:
InnoDB buffer pool
• 2) flush list for dirty page management. Flushing:
Dirty page, LSN = 5 Dirty page, LSN = 7 Dirty page, LSN = 12
Flush list head:
Flush up to LSN 10
Clean page
Clean page
Dirty page, LSN = 12
Flush list head:
InnoDB buffer pool
• 3) LRU list for deciding which pages to evict
• Preventing eviction for recently-used pages
(making them young):
Dirty pageClean page Dirty page Clean page Clean page
Page access
Dirty pageClean page Dirty page Clean pageClean page
InnoDB buffer pool
• 4) free list for having free space in the buffer pool
to read currently non-present pages. Reading:
Free page Free page Free page Free page Free page
Page read
Free page Free page Free page Free page
Clean page
InnoDB buffer pool
• 3/4) Evicting/flushing pages from the LRU list and
putting them on the free list:
Dirty pageClean page Dirty page Clean page Clean page
Free page Free page Free page Free page
Dirty page Dirty page Clean page Clean page
Free page Free page Free page Free page Free page
The doublewrite buffer
Data
page
Doublewrite buffer in disk
Data file
Doublewrite buffer in memory
Add
Flush
Write
Step 1
Step 2
Step 3
Buffer pool concurrency
flush list
LRU list
free listpage hash
misc.
buffer pool mutexflush list mutexpage hash latch
Buffer pool instances
flush list LRU list
free list
page hash
misc.
buffer pool mutexflush list mutexpage hash latch
buffer pool instance 0
flush list LRU list
free list
page hash
misc.
buffer pool mutexflush list mutexpage hash latch
buffer pool instance 1
Buffer pool instances
• Problem: some instances are cold and some are
hot
• “First the accesses to the buffer pools is in no way
evenly spread out.”
• http://bit.ly/bpsplit
• Six year-old quote, still relevant the same today
Concurrency in XtraDB
flush listpage hash
flush list mutexpage hash latch
LRU list
LRU list mutex
free list
free list mutex
misc
misc mutex /
atomics
Patch contributed to MySQL, and merged in 8.0.0
http://bugs.mysql.com/bug.php?id=75534
Concurrency solutions are
compatible
flush listpage hash
flush list mutexpage hash latch
LRU list
LRU list mutex
free list
free list mutex
misc
misc mutex /
atomics
buffer pool instance 0
buffer pool instance 1
flush listpage hash
flush list mutexpage hash latch
LRU list
LRU list mutex
free list
free list mutex
misc
misc mutex /
atomics
Buffer pool mutexes are so
5.5
Improvement
by the buffer
pool mutex
split
Improvement
by adaptive
flushing
5.6+ changed things
• In 5.5 and earlier: reduce mutex contention by X%,
observe TPS increase by ~X%
• Changing flushing heuristics is driven by
performance stability, not necessarily by peak
performance
• Pre-release Percona Server 5.6: reduce mutex
contention by X%, observe TPS increase by ~0%
• What happened? InnoDB cleaner thread happened
Buffer pool / flushing
concurrency in 5.5
Time Master thread Query thread 1 Query thread 2
flush list flush
flush list flush
flush list flush
make page young
make page young
LRU list flush
make page young
LRU list flush
make page young
Buffer pool / flushing
concurrency in 5.6+
Time Cleaner thread Query thread 1 Query thread 2
flush list flush
flush list flush
flush list flush
make page young
make page young
LRU list flush
make page young
LRU list flush
make page young
LRU list flush
Buffer pool / flushing
concurrency in 5.6+
• In 5.6+, code-level changes to reduce locking
granularity are still important, but
• Increasing thread specialization means that…
• …flushing - including LRU - heuristics are very
important now
MySQL 5.7 multi-threaded
flushing
LRU instance #0 flush list instance #0
LRU instance #1 flush list instance #1
LRU instance #2 flush list instance #2
coordinator thread
worker thread #0
worker thread #1
time
0 s 1 s
LRU…
LRU…
LRU…
MySQL 5.7.11 OLTP_RW
PFS data is incomplete
MySQL 5.7.11 OLTP_RW
660 pthread_cond_wait,enter (ib0mutex.h:850), buf_dblwr_write_single_page
(ib0mutex.h:850),buf_flush_write_block_low(buf0flu.cc:1096),buf_flush_page
(buf0flu.cc1096),buf_flush_single_page_from_LRU (buf0flu.cc:2217),
buf_LRU_get_free_block(buf0lru.cc:1401),...
631 pthread_cond_wait,buf_dblwr_write_single_page (buf0dblwr.cc:1213),
buf_flush_write_block_low(buf0flu.cc:1096),buf_flush_page (buf0flu.cc:1096),
buf_flush_single_page_from_LRU (buf0flu.cc:2217),
buf_LRU_get_free_block(buf0lru.cc:1401),...
337 pthread_cond_wait,PolicyMutex<TTASEventMutex<GenericPolicy>
(ut0mutex.ic:89),get_next_redo_rseg (trx0trx.cc:1185),
trx_assign_rseg_low(trx0trx.cc:1278),trx_set_rw_mode (trx0trx.cc:1278),
lock_table(lock0lock.cc:4076),...
631 pthread_cond_wait,buf_dblwr_write_single_page
Single-page flushing
Is
free
page
available?
Single-page
flush
Take a free page
from the free list
Query thread
needs a free page
Yes No
Single-page
doublewrite
Query thread
has a free page
XtraDB
innodb_empty_free_list_algorithm=backoff
Is
free
page
available?
Wait
Take a free page
from the free list
Query thread
needs a free page
Yes No
Single-page
doublewrite
Query thread
has a free page
Single-page
flush
MySQL 5.7 multi-threaded
flushing
LRU instance #0 flush list instance #0
LRU instance #1 flush list instance #1
LRU instance #2 flush list instance #2
coordinator thread
worker thread #0
worker thread #1
time
0 s 1 s
LRU…
LRU…
LRU…
free pages
Single
page
flushes!
free pages
Percona Server 5.7 multi-
threaded flushing
LRU flusher #0
LRU flusher #1
LRU instance #0 LRU instance #0 LRU…
free pages
LRU instance #1 LRU…
free pages
flush list instance #0
flush list instance #1
coordinator
worker #0
time
0 s 1 s
flush…
flush…
Percona Server 5.7.10-3
OLTP_RW
Percona Server 5.7.10-3
OLTP_RW
2678 nanosleep (libpthread.so.0), … ,buf_LRU_get_free_block
(buf0lru.cc:1435), ...
867 pthread_cond_wait,...,log_write_up_to(log0log.cc:1293),...
396 pthread_cond_wait,…, mtr_t::s_lock(sync0rw.ic:433),
btr_cur_search_to_nth_level(btr0cur.cc:1022),...
337 libaio::??(libaio.so.1),LinuxAIOHandler::collect (os0file.cc:
2325), ...
240 poll(libc.so.
6),...,Protocol_classic::read_packet(protocol_classic.cc:810),...
2678 nanosleep, …, buf_LRU_get_free_block
Percona Server 5.7.10-3
OLTP_RW flushers only
Legacy doublewrite buffer:
adding pages
Percona Server 5.7.10-3
OLTP_RW flushers only
139 libaio::??(libaio.so.1),LinuxAIOHandler::collect (os0file.cc:2448),
LinuxAIOHandler::poll(os0file.cc:2594),...
56 pthread_cond_wait,…,buf_dblwr_add_to_batch
(buf0dblwr.cc:1111),…,buf_flush_LRU_list_batch
(buf0flu.cc:1555), ...,buf_lru_manager(buf0flu.cc:2334),...
25 pthread_cond_wait,…,os_event_wait_low
(os0event.cc:534),buf_flush_page_cleaner_worker(buf0flu.cc:3482),...
21 pthread_cond_wait, …, PolicyMutex<TTASEventMutex<GenericPolicy>
(ut0mutex.ic:89),buf_page_io_complete (buf0buf.cc:5966),
fil_aio_wait(fil0fil.cc:5754),io_handler_thread(srv0start.cc:330),...
8 pthread_cond_timedwait,…,buf_flush_page_cleaner_coordinator
(buf0flu.cc:2726),...
56 pthread_cond_wait, …, buf_dblwr_add_to_batch
Legacy doublewrite buffer:
flushing buffer
Parallel doublewrite buffer:
adding pages
Parallel doublewrite buffer:
flushing buffers
Percona Server 5.7.11-4
OLTP_RW flushers only
Percona Server 5.7.11-4
OLTP_RW flushers only
112 libaio::??(libaio.so.1),LinuxAIOHandler::collect
(os0file.cc:2455),...,io_handler_thread(srv0start.cc:330),...
54 pthread_cond_wait,…,buf_dblwr_flush_buffered_writes
(buf0dblwr.cc:1287),…,buf_flush_LRU_list
(buf0flu.cc:2341),buf_lru_manager(buf0flu.cc:2341),...
35 pthread_cond_wait, …, PolicyMutex<TTASEventMutex<GenericPolicy>
(ut0mutex.ic:89), buf_page_io_complete(buf0buf.cc:5986), …,
io_handler_thread(srv0start.cc:330),...
27 pthread_cond_wait,...,buf_flush_page_cleaner_worker(buf0flu.cc:3489),...
10 pthread_cond_wait,…,enter(ib0mutex.h:845),
buf_LRU_block_free_non_file_page(ib0mutex.h:845),
buf_LRU_block_free_hashed_page(buf0lru.cc:2567),
…,buf_page_io_complete(buf0buf.cc:6070), …,io_handler_thread
(srv0start.cc:330),...
Percona Server 5.7
OLTP_RW
Percona Server 5.7
OLTP_RW
Summary: 5.7 story
• I/O-bound workloads: high demand for free pages,
provided by LRU batch flushing or single-page flushing
• Single-page flushes are bad, w/ and w/o doublewrite
• Removed it
• Made batch LRU flusher truly parallel
• Doublewrite buffer negates parallel flushing gains
• Made it parallel too
44
Rate My Session!

Percona Server 5.7: Key Performance Algorithms

  • 1.
    XtraDB 5.7: Key PerformanceAlgorithms Laurynas Biveinis Alexey Stroganov Percona firstname.lastname@percona.com
  • 2.
    XtraDB 5.7 KeyPerformance Algorithms • Focus on the buffer pool, flushing, the doublewrite buffer • Talk assumes familiarity, but feel free to interrupt • What we learned • What we did • How we did it
  • 3.
    InnoDB buffer pool •Memory cache of disk data pages • In-memory data pages accessible through several data structures • 1) Page hash for lookup (space_id; page_id) ? Hash array Data page lists Fold
  • 4.
    InnoDB buffer pool •2) flush list for dirty page management. Dirtying: Clean page INSERT INTO foo VALUES(bar) Dirty page, LSN = 42 Dirty page, LSN = 25 Dirty page, LSN = 32 Dirty page, LSN = 42 Flush list tail:
  • 5.
    InnoDB buffer pool •2) flush list for dirty page management. Flushing: Dirty page, LSN = 5 Dirty page, LSN = 7 Dirty page, LSN = 12 Flush list head: Flush up to LSN 10 Clean page Clean page Dirty page, LSN = 12 Flush list head:
  • 6.
    InnoDB buffer pool •3) LRU list for deciding which pages to evict • Preventing eviction for recently-used pages (making them young): Dirty pageClean page Dirty page Clean page Clean page Page access Dirty pageClean page Dirty page Clean pageClean page
  • 7.
    InnoDB buffer pool •4) free list for having free space in the buffer pool to read currently non-present pages. Reading: Free page Free page Free page Free page Free page Page read Free page Free page Free page Free page Clean page
  • 8.
    InnoDB buffer pool •3/4) Evicting/flushing pages from the LRU list and putting them on the free list: Dirty pageClean page Dirty page Clean page Clean page Free page Free page Free page Free page Dirty page Dirty page Clean page Clean page Free page Free page Free page Free page Free page
  • 9.
    The doublewrite buffer Data page Doublewritebuffer in disk Data file Doublewrite buffer in memory Add Flush Write Step 1 Step 2 Step 3
  • 10.
    Buffer pool concurrency flushlist LRU list free listpage hash misc. buffer pool mutexflush list mutexpage hash latch
  • 11.
    Buffer pool instances flushlist LRU list free list page hash misc. buffer pool mutexflush list mutexpage hash latch buffer pool instance 0 flush list LRU list free list page hash misc. buffer pool mutexflush list mutexpage hash latch buffer pool instance 1
  • 12.
    Buffer pool instances •Problem: some instances are cold and some are hot • “First the accesses to the buffer pools is in no way evenly spread out.” • http://bit.ly/bpsplit • Six year-old quote, still relevant the same today
  • 13.
    Concurrency in XtraDB flushlistpage hash flush list mutexpage hash latch LRU list LRU list mutex free list free list mutex misc misc mutex / atomics Patch contributed to MySQL, and merged in 8.0.0 http://bugs.mysql.com/bug.php?id=75534
  • 14.
    Concurrency solutions are compatible flushlistpage hash flush list mutexpage hash latch LRU list LRU list mutex free list free list mutex misc misc mutex / atomics buffer pool instance 0 buffer pool instance 1 flush listpage hash flush list mutexpage hash latch LRU list LRU list mutex free list free list mutex misc misc mutex / atomics
  • 15.
    Buffer pool mutexesare so 5.5 Improvement by the buffer pool mutex split Improvement by adaptive flushing
  • 16.
    5.6+ changed things •In 5.5 and earlier: reduce mutex contention by X%, observe TPS increase by ~X% • Changing flushing heuristics is driven by performance stability, not necessarily by peak performance • Pre-release Percona Server 5.6: reduce mutex contention by X%, observe TPS increase by ~0% • What happened? InnoDB cleaner thread happened
  • 17.
    Buffer pool /flushing concurrency in 5.5 Time Master thread Query thread 1 Query thread 2 flush list flush flush list flush flush list flush make page young make page young LRU list flush make page young LRU list flush make page young
  • 18.
    Buffer pool /flushing concurrency in 5.6+ Time Cleaner thread Query thread 1 Query thread 2 flush list flush flush list flush flush list flush make page young make page young LRU list flush make page young LRU list flush make page young LRU list flush
  • 19.
    Buffer pool /flushing concurrency in 5.6+ • In 5.6+, code-level changes to reduce locking granularity are still important, but • Increasing thread specialization means that… • …flushing - including LRU - heuristics are very important now
  • 20.
    MySQL 5.7 multi-threaded flushing LRUinstance #0 flush list instance #0 LRU instance #1 flush list instance #1 LRU instance #2 flush list instance #2 coordinator thread worker thread #0 worker thread #1 time 0 s 1 s LRU… LRU… LRU…
  • 21.
  • 22.
    PFS data isincomplete
  • 23.
    MySQL 5.7.11 OLTP_RW 660pthread_cond_wait,enter (ib0mutex.h:850), buf_dblwr_write_single_page (ib0mutex.h:850),buf_flush_write_block_low(buf0flu.cc:1096),buf_flush_page (buf0flu.cc1096),buf_flush_single_page_from_LRU (buf0flu.cc:2217), buf_LRU_get_free_block(buf0lru.cc:1401),... 631 pthread_cond_wait,buf_dblwr_write_single_page (buf0dblwr.cc:1213), buf_flush_write_block_low(buf0flu.cc:1096),buf_flush_page (buf0flu.cc:1096), buf_flush_single_page_from_LRU (buf0flu.cc:2217), buf_LRU_get_free_block(buf0lru.cc:1401),... 337 pthread_cond_wait,PolicyMutex<TTASEventMutex<GenericPolicy> (ut0mutex.ic:89),get_next_redo_rseg (trx0trx.cc:1185), trx_assign_rseg_low(trx0trx.cc:1278),trx_set_rw_mode (trx0trx.cc:1278), lock_table(lock0lock.cc:4076),...
  • 24.
  • 25.
    Single-page flushing Is free page available? Single-page flush Take afree page from the free list Query thread needs a free page Yes No Single-page doublewrite Query thread has a free page
  • 26.
    XtraDB innodb_empty_free_list_algorithm=backoff Is free page available? Wait Take a freepage from the free list Query thread needs a free page Yes No Single-page doublewrite Query thread has a free page Single-page flush
  • 27.
    MySQL 5.7 multi-threaded flushing LRUinstance #0 flush list instance #0 LRU instance #1 flush list instance #1 LRU instance #2 flush list instance #2 coordinator thread worker thread #0 worker thread #1 time 0 s 1 s LRU… LRU… LRU… free pages Single page flushes! free pages
  • 28.
    Percona Server 5.7multi- threaded flushing LRU flusher #0 LRU flusher #1 LRU instance #0 LRU instance #0 LRU… free pages LRU instance #1 LRU… free pages flush list instance #0 flush list instance #1 coordinator worker #0 time 0 s 1 s flush… flush…
  • 29.
  • 30.
    Percona Server 5.7.10-3 OLTP_RW 2678nanosleep (libpthread.so.0), … ,buf_LRU_get_free_block (buf0lru.cc:1435), ... 867 pthread_cond_wait,...,log_write_up_to(log0log.cc:1293),... 396 pthread_cond_wait,…, mtr_t::s_lock(sync0rw.ic:433), btr_cur_search_to_nth_level(btr0cur.cc:1022),... 337 libaio::??(libaio.so.1),LinuxAIOHandler::collect (os0file.cc: 2325), ... 240 poll(libc.so. 6),...,Protocol_classic::read_packet(protocol_classic.cc:810),...
  • 31.
    2678 nanosleep, …,buf_LRU_get_free_block
  • 32.
  • 33.
  • 34.
    Percona Server 5.7.10-3 OLTP_RWflushers only 139 libaio::??(libaio.so.1),LinuxAIOHandler::collect (os0file.cc:2448), LinuxAIOHandler::poll(os0file.cc:2594),... 56 pthread_cond_wait,…,buf_dblwr_add_to_batch (buf0dblwr.cc:1111),…,buf_flush_LRU_list_batch (buf0flu.cc:1555), ...,buf_lru_manager(buf0flu.cc:2334),... 25 pthread_cond_wait,…,os_event_wait_low (os0event.cc:534),buf_flush_page_cleaner_worker(buf0flu.cc:3482),... 21 pthread_cond_wait, …, PolicyMutex<TTASEventMutex<GenericPolicy> (ut0mutex.ic:89),buf_page_io_complete (buf0buf.cc:5966), fil_aio_wait(fil0fil.cc:5754),io_handler_thread(srv0start.cc:330),... 8 pthread_cond_timedwait,…,buf_flush_page_cleaner_coordinator (buf0flu.cc:2726),...
  • 35.
    56 pthread_cond_wait, …,buf_dblwr_add_to_batch
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
    Percona Server 5.7.11-4 OLTP_RWflushers only 112 libaio::??(libaio.so.1),LinuxAIOHandler::collect (os0file.cc:2455),...,io_handler_thread(srv0start.cc:330),... 54 pthread_cond_wait,…,buf_dblwr_flush_buffered_writes (buf0dblwr.cc:1287),…,buf_flush_LRU_list (buf0flu.cc:2341),buf_lru_manager(buf0flu.cc:2341),... 35 pthread_cond_wait, …, PolicyMutex<TTASEventMutex<GenericPolicy> (ut0mutex.ic:89), buf_page_io_complete(buf0buf.cc:5986), …, io_handler_thread(srv0start.cc:330),... 27 pthread_cond_wait,...,buf_flush_page_cleaner_worker(buf0flu.cc:3489),... 10 pthread_cond_wait,…,enter(ib0mutex.h:845), buf_LRU_block_free_non_file_page(ib0mutex.h:845), buf_LRU_block_free_hashed_page(buf0lru.cc:2567), …,buf_page_io_complete(buf0buf.cc:6070), …,io_handler_thread (srv0start.cc:330),...
  • 41.
  • 42.
  • 43.
    Summary: 5.7 story •I/O-bound workloads: high demand for free pages, provided by LRU batch flushing or single-page flushing • Single-page flushes are bad, w/ and w/o doublewrite • Removed it • Made batch LRU flusher truly parallel • Doublewrite buffer negates parallel flushing gains • Made it parallel too
  • 44.