SlideShare a Scribd company logo
1/31/2017 1
MySQL Buffer Management
Mijin Ahn
meeeeejin@gmail.com
Contents
1/31/2017 2
• Overview
• Buffer Pool
• Buffer Read
• LRU Replacement
• Flusher
• Doublewrite Buffer
• Synchronization
OVERVIEW
1/31/2017 3
Overview
1/31/2017 4
• Buffer Pool
– Considering memory
hierarchy
– Caching frequently
accessed data into
DRAM like a cache
memory in CPU
– Exploit locality
Overview
• InnoDB Architecture
1/31/2017 5
Handler API
Transaction (trx)
Logging &
Crash
Recovery
(log & recv)
Mini
Transaction
(mtr)
Lock
(lck)
Cursor (cur)
Row (row)
B-tree (btr)
Page (page)
Buffer Manager (buf)
Free space / File Management (fsp / fil)
IO
Overview
• Buffer Manager
– Buffer Pool (buf0buf.cc) : buffer pool manager
– Buffer Read (buf0read.cc) : read buffer
– LRU (buf0lru.cc) : buffer replacement
1/31/2017 6
Overview
• Buffer Manager
– Flusher (buf0flu.cc) : dirty page writer & background flusher
– Doublewrite (buf0dblwr.cc) : doublewrite buffer
1/31/2017 7
BUFFER POOL
1/31/2017 8
Lists of Buffer Blocks
• Free list
– Contains free page frames
• LRU list
– Contains all the blocks holding a file page
• Flush list
– Contains the blocks holding file pages that have been modified in the
memory but not written to disk yet
1/31/2017 9
Database
Buffer
TailHead D D D
Main LRU list
Free list
D
Flush listD D
Buffer Pool Mutex
• The buffer buf_pool contains a single mutex
– buf_pool->mutex: protects all the control data structures of the
buf_pool
• The buf_pool->mutex is a hot-spot in main memory
– Causing a lot of memory bus traffic on multiprocessor systems when
processors alternatively access the mutex
– A solution to reduce mutex contention
• To create a separate lock for the page hash table
1/31/2017 10
Buffer Pool Struct
• include/buf0buf.h: buf_pool_t
1/31/2017 11
Buffer Pool Struct
• include/buf0buf.h: buf_pool_t
1/31/2017 12
Buffer Pool Struct
• include/buf0buf.h: buf_pool_t
1/31/2017 13
Buffer Pool Struct
• include/buf0buf.h: buf_pool_t
1/31/2017 14
Buffer Pool Init
• buf/buf0buf.cc: buf_pool_init()
1/31/2017 15
...
Buffer pool init per instance
Buffer Pool Init
• buf/buf0buf.cc: buf_pool_init_instance()
1/31/2017 16
Create mutex for buffer pool
Buffer Pool Init
• buf/buf0buf.cc: buf_pool_init_instance()
1/31/2017 17
Initialize buffer chunk
Buffer Pool Init
• buf/buf0buf.cc: buf_pool_init_instance()
1/31/2017 18
Create page hash table
Buffer Chunk
• Total buffer pool size
= x * innodb_buffer_pool_instances * innodb_buffer_pool_chunk_size
1/31/2017 19
Buffer Chunk Struct
• include/buf0buf.ic: buf_chunk_t
1/31/2017 20
[0]
[…]
[N]
[0] [1] [3][2]
[…] […] [N][…]
blocks
frames
Buffer chunk mem
• buf/buf0buf.cc: buf_chunk_init()
Buffer Pool Init
1/31/2017 21
Allocate chunk mem
(blocks + frames)
• buf/buf0buf.cc: buf_chunk_init()
Buffer Pool Init
1/31/2017 22
Allocate control blocks
Allocate frames
(Page size is aligned)
• buf/buf0buf.cc: buf_chunk_init()
Buffer Pool Init
1/31/2017 23
Add all blocks to free list
Initialize control block
Buffer Control Block (BCB)
• The control block contains
– Read-write lock
– Buffer fix count
• Which is incremented when a thread wants a file page to be fixed in a buffer
frame
• The buffer fix operation does not lock the contents of the frame
– Page frame
– File pages
• Put to a hash table according to the file address of the page
1/31/2017 24
Buffer Control Block (BCB)
1/31/2017 25
Page
Frame ptr
Mutex
RW lock
Lock hash value
…
buf_block_t
Table space ID
Page offset
Buffer fix count
IO fix
State
Hash
List (LRU, free, flush)
…
buf_page_t
• include/buf0buf.h: buf_block_t
BCB Struct
1/31/2017 26
• include/buf0buf.h: buf_block_t
BCB Struct
1/31/2017 27
• include/buf0buf.h: buf_page_t
BCB Struct
1/31/2017 28
Page identification
• include/buf0buf.h: buf_page_t
BCB Struct
1/31/2017 29
...
...
• include/buf0buf.h: buf_page_t
BCB Struct
1/31/2017 30
• buf/buf0buf.cc: buf_block_init()
BCB Init
1/31/2017 31
Set data frame
... Create block mutex & rw lock
BUFFER READ
PART 1 : READ A PAGE
1/31/2017 32
MySQL Buffer Manager: Read
Database
on Flash SSD
Database
Buffer
3. Read a page
Tail
1. Search free list
Head D D D
Main LRU List
Free list
Dirty Page Set
D D
Scan LRU List from tail
Double Write Buffer
2. Flush Dirty Pages
D
Buffer Read
• Read a page (buf0rea.cc)
– Find a certain page in the buffer pool using hash table
– If it is not in the buffer pool, then read a block from the storage
• Allocate a free block for read (include buffer block)
• Two cases
– Buffer pool has free blocks
– Buffer pool doesn’t have a free block
• Read a page
1/31/2017 34
Buffer Read
1/31/2017 35
Buffer Read
• buf/buf0buf.cc: buf_page_get_gen()
1/31/2017 36
Get the buffer pool ptr using space & offset
** 2 important things **
1) ID of a page is (space, offset) of the page
2) Buffer pool – page mapping is mapped
Buffer Read
• include/buf0buf.ic: buf_pool_get()
1/31/2017 37
Make a fold number
Buffer Read
• Why fold?
– They want to put pages together in the same buffer pool if
it is the same extents for read ahead
1/31/2017 38
Page 0
Page 1
…
Page 63
Page 64
Page 65
…
Page 127
Fold 0 Fold 1
…
Buffer Read
• buf/buf0buf.cc: buf_page_get_gen()
1/31/2017 39
Get page hash lock before
searching in the hash table
Set shared lock on hash table
Buffer Read
• buf/buf0buf.cc: buf_page_get_gen()
1/31/2017 40
...
...
Find a page in
the hash table
Page doesn’t exist in buffer pool
Read the page from the storage
retry
success
Buffer Read
• buf/buf0buf.cc: buf_page_get_gen()
1/31/2017 41
Fail
Buffer Read
• buf/buf0buf.cc: buf_page_get_gen()
1/31/2017 42
If it failed to read target page,
go to the first part of the loop
Buffer Read
• buf/buf0rea.cc: buf_read_page()
1/31/2017 43
Buffer Read
• buf/buf0rea.cc: buf_read_page_low()
1/31/2017 44
Buffer Read
• buf/buf0rea.cc: buf_read_page_low()
1/31/2017 45
Allocate buffer block for read
Buffer Read
• buf/buf0buf.cc: buf_page_init_for_read()
1/31/2017 46
Buffer Read
• buf/buf0buf.cc: buf_page_init_for_read()
1/31/2017 47
Get free block: see this later
Buffer Read
• buf/buf0buf.cc: buf_page_init_for_read()
1/31/2017 48
Initialize buffer page for current read
Buffer Read
• buf/buf0buf.cc: buf_page_init()
1/31/2017 49
Buffer Read
• buf/buf0buf.cc: buf_page_init()
1/31/2017 50
Buffer Read
• buf/buf0buf.cc: buf_page_init()
1/31/2017 51
Insert a page into hash table
Buffer Read
• buf/buf0buf.cc: buf_page_init_for_read()
1/31/2017 52
Add current block to LRU list
: see this later
Set io fix to BUF_IO_READ
Buffer Read
• buf/buf0buf.cc: buf_page_init_for_read()
1/31/2017 53
Increase pending read count;
How many buffer read were
requested but not finished
Buffer Read
• We allocate a free buffer and control block
• And the block was inserted into hash table and LRU list of
corresponding buffer pool
• Now, we need to read the real content of the target page from
the storage
1/31/2017 54
Buffer Read
• buf/buf0rea.cc: buf_read_page_low()
1/31/2017 55
Read a page from storage
: see this later
Buffer Read
• buf/buf0rea.cc: buf_read_page_low()
1/31/2017 56
Complete the read request
Buffer Read
• buf/buf0buf.cc: buf_page_io_complete()
1/31/2017 57
Get io type (In this case, BUF_IO_READ)
Buffer Read
• buf/buf0buf.cc: buf_page_io_complete()
1/31/2017 58
Get io type (In this case, BUF_IO_READ)
...
Page corruption check based
on checksum in the page
Buffer Read
• buf/buf0buf.cc: buf_page_io_complete()
1/31/2017 59
Set io fix to BUF_IO_NONE
Buffer Read
• buf/buf0buf.cc: buf_page_io_complete()
1/31/2017 60
Decrease pending read count
BUFFER READ
PART 2 : AFTER GOT BLOCK
1/31/2017 61
Buffer Read
• buf/buf0buf.cc: buf_page_get_gen()
1/31/2017 62
...
Set access time
Buffer Read
• buf/buf0buf.cc: buf_page_get_gen()
1/31/2017 63
Do read ahead process
(default=false)
LRU REPLACEMENT
PART 1 : ADD BLOCK
1/31/2017 64
LRU Add Block
• buf/buf0buf.cc: buf_page_init_for_read()
1/31/2017 65
Add current block to LRU list
LRU Add Block
• buf/buf0lru.cc: buf_LRU_add_block()
1/31/2017 66
LRU Add Block
• buf/buf0lru.cc: buf_LRU_add_block_low()
1/31/2017 67
If list is too small, then put
current block to first of the list
LRU Add Block
• buf/buf0lru.cc: buf_LRU_add_block_low()
1/31/2017 68
Else, insert current block to
after LRU_old pointer
LRU REPLACEMENT
PART 2 : GET FREE BLOCK
1/31/2017 69
LRU Get Free Block
• This function is called from a user thread when it needs a
clean block to read in a page
– Note that we only ever get a block from the free list
– Even when we flush a page or find a page in LRU scan we put it to
free list to be used
1/31/2017 70
LRU Get Free Block
• iteration 0:
– get a block from free list, success: done
– if there is an LRU flush batch in progress:
• wait for batch to end: retry free list
– if buf_pool->try_LRU_scan is set
• scan LRU up to srv_LRU_scan_depth to find a clean block
• the above will put the block on free list
• success: retry the free list
– flush one dirty page from tail of LRU to disk (= single page flush)
• the above will put the block on free list
• success: retry the free list
1/31/2017 71
LRU Get Free Block
• iteration 1:
– same as iteration 0 except:
• scan whole LRU list
• iteration > 1:
– same as iteration 1 but sleep 100ms
1/31/2017 72
LRU Get Free Block
• buf/buf0lru.cc: buf_LRU_get_free_block()
1/31/2017 73
Get buffer pool mutex
Get a free block from the free list
LRU Get Free Block
• buf/buf0lru.cc: buf_LRU_get_free_block()
1/31/2017 74
Getting a free block succeeded
LRU Get Free Block
• buf/buf0lru.cc: buf_LRU_get_free_block()
1/31/2017 75
If already background flushed
started, wait for it to end
LRU Get Free Block
• buf/buf0lru.cc: buf_LRU_get_free_block()
1/31/2017 76
Find a victim page to replace
and make a free block
LRU Get Free Block
• buf/buf0lru.cc: buf_LRU_scan_and_free_block()
1/31/2017 77
LRU Get Free Block
• buf/buf0lru.cc: buf_LRU_scan_and_free_block()
1/31/2017 78
LRU Get Free Block
• buf/buf0lru.cc: buf_LRU_free_from_common_LRU_list()
1/31/2017 79
Try to free it
LRU Get Free Block
• buf/buf0lru.cc: buf_LRU_free_page()
1/31/2017 80
Get hash lock & block mutex
LRU Get Free Block
• buf/buf0lru.cc: buf_LRU_free_page()
1/31/2017 81
If current page is dirty and not
flushed to disk yet, exit
• buf/buf0lru.cc: buf_LRU_free_page()
– After func_exit, we are on clean case!
LRU Get Free Block
1/31/2017 82
...
...
...
• buf/buf0lru.cc: buf_LRU_block_remove_hashed()
LRU Get Free Block
1/31/2017 83
LRU Get Free Block
• buf/buf0lru.cc: buf_LRU_get_free_block()
1/31/2017 84
...
If we have free block(s),
go to loop
LRU Get Free Block
• buf/buf0lru.cc: buf_LRU_get_free_block()
1/31/2017 85
If we failed to make a free block, do a single page flush
FLUSH A PAGE
PART 1 : SINGLE PAGE FLUSH
1/31/2017 86
Flush a Page
• Flushing a page by
– A background flusher  batch flush (LRU & flush list)
– A single page flush  in LRU_get_free_block()
• Background flusher
– Regularly check system status (per 1000ms)
– Flush all buffer pool instances in a batch manner
1/31/2017 87
Single Page Flush
1/31/2017 88
buf_LRU_get_
free_block()
buf_flush_single
_page_from_LRU()
buf_flush_
ready_for_
flush() buf_flush
_page()
buf_flush_write
_block_low()
buf_dblwr_write
_single_page()
fil_io()
fil_flush()
buf_page_io_
complete()
• buf/buf0flu.cc: buf_flush_single_page_from_LRU()
Single Page Flush
1/31/2017 89
Full scan
• buf/buf0flu.cc: buf_flush_single_page_from_LRU()
Single Page Flush
1/31/2017 90
Check whether we can flush
current block and ready for flush
Try to flush it;
write to disk
...
• buf/buf0flu.cc: buf_flush_ready_for_flush()
Single Page Flush
1/31/2017 91
If the page is already flushed
or doing IO, return false
...
• buf/buf0flu.cc: buf_flush_page()
– Writes a flushable page from the buffer pool to a file
Single Page Flush
1/31/2017 92
Get lock
...
• buf/buf0flu.cc: buf_flush_page()
Single Page Flush
1/31/2017 93
Set fix and flush type
...
...
• buf/buf0flu.cc: buf_flush_write_block_low()
Single Page Flush
1/31/2017 94
Flush log (transaction log – WAL)
• buf/buf0flu.cc: buf_flush_write_block_low()
Single Page Flush
1/31/2017 95
Doublewrite off case
Write the page to dwb, then write to datafile;
See this in dwb part
• buf/buf0flu.cc: buf_flush_write_block_low()
Single Page Flush
1/31/2017 96
Sync buffered write to disk;
call fsync by fil_flush()
FLUSH A PAGE
PART 2 : BATCH FLUSH
1/31/2017 97
Batch Flush
• Background flusher (= page cleaner thread)
– Independent thread for flushing a dirty pages from buffer pools to
storage
– Regularly (per 1000ms) do flush from LRU tail or
– Do flush by dirty page percent (configurable)
• Thread definition
– buf/buf0flu.cc: DECLARE_THREAD(buf_flush_page_cleaner_thread)
1/31/2017 98
...
• buf/buf0flu.cc: DECLARE_THREAD(buf_flush_page_cleaner_thread)
Background Flusher
1/31/2017 99
Run until shutdown
• buf/buf0flu.cc: DECLARE_THREAD(buf_flush_page_cleaner_thread)
Background Flusher
1/31/2017 100
Nothing has
been changed
Something has
been changed!
• buf/buf0flu.cc: buf_flush_LRU_tail()
• Clears up tail of the LRU lists:
– Put replaceable pages at the tail of LRU to the free list
– Flush dirty pages at the tail of LRU to the disk
• srv_LRU_scan_depth
– Scan each buffer pool at this amount
– Configuable: innodb_LRU_scan_depth
LRU List Batch Flush
1/31/2017 101
LRU List Batch Flush
1/31/2017 102
buf_flush_
LRU_tail()
buf_flush
_LRU()
buf_flush
_batch()
buf_do_LRU
_batch()
buf_flush
_LRU_list
_batch()
buf_LRU_free
_page()
fil_io()
fil_flush()
buf_page_io_
complete()
buf_flush_p
age_and_try
_neighbors()
buf_flush
_page()
buf_flush_write
_block_low()
buf_dblwr
_add_to_
batch()
buf_dblwr_
flush_
buffered_
writes()
• buf/buf0flu.cc: buf_flush_LRU_tail()
LRU List Batch Flush
1/31/2017 103
Per buffer pool instance
• buf/buf0flu.cc: buf_flush_LRU_tail()
LRU List Batch Flush
1/31/2017 104
Chunk size = 100
• buf/buf0flu.cc: buf_flush_LRU()
LRU List Batch Flush
1/31/2017 105
...
Batch LRU flush
• buf/buf0flu.cc: buf_flush_batch()
LRU List Batch Flush
1/31/2017 106
Do LRU batch
• buf/buf0flu.cc: buf_do_LRU_batch()
LRU List Batch Flush
1/31/2017 107
• buf/buf0flu.cc: buf_flush_LRU_list_batch()
LRU List Batch Flush
1/31/2017 108
Get the last page
from LRU
• buf/buf0flu.cc: buf_flush_LRU_list_batch()
LRU List Batch Flush
1/31/2017 109
• buf/buf0flu.cc: buf_flush_ready_for_replace()
LRU List Batch Flush
1/31/2017 110
Check whether current
page is clean page or not
• buf/buf0flu.cc: buf_flush_LRU_list_batch()
LRU List Batch Flush
1/31/2017 111
It there is any replaceable page,
free the page
• buf/buf0flu.cc: buf_flush_LRU_list_batch()
LRU List Batch Flush
1/31/2017 112
Else, try to flush neighbor pages
• buf/buf0flu.cc: buf_flush_page_and_try_neighbors()
LRU List Batch Flush
1/31/2017 113
Flush page, but no sync
• buf/buf0flu.cc: buf_flush_write_block_low()
LRU List Batch Flush
1/31/2017 114
Doublewrite off case
Add the page to the dwb buffer;
See this in dwb part
LRU List Batch Flush
• Now, victim pages are gathered for replacement
• We need to flush them to disk
• We can do this by calling buf_flush_common()
1/31/2017 115
• buf/buf0flu.cc: buf_flush_LRU()
LRU List Batch Flush
1/31/2017 116
...
• buf/buf0flu.cc: buf_flush_common()
LRU List Batch Flush
1/31/2017 117
• flush all pages we gathered so far
• write the pages to dwb area first
• then issue it to datafile
; See this later
DOUBLEWRITE BUFFER
PART 1 : ARCHITECTURE
1/31/2017 118
Double Write Buffer
• To avoid torn page (partial page) written problem
• Write dirty pages to special storage area in system tablespace
priori to write database file
1/31/2017 119
Database
on Flash SSD
Database
Buffer
TailHead D D D
Main LRU List
Free list
Dirty Page Set
D D
Scan LRU List from tail
D
Double Write Buffer
ibdata1-space0
datafiles
DWB Architecture
1/31/2017 120
• include/buf0dblwr.cc: buf_dblwr_t
DWB Struct
1/31/2017 121
• include/buf0dblwr.cc: buf_dblwr_t
DWB Struct
1/31/2017 122
DOUBLEWRITE BUFFER
PART 2 : SINGLE PAGE FLUSH
1/31/2017 123
• buf/buf0flu.cc: buf_flush_write_block_low()
Single Page Flush
1/31/2017 124
...
• buf/buf0dblwr.cc: buf_dblwr_write_single_page()
Single Page Flush
1/31/2017 125
# of slots for single page flush
= 2 * DOUBLEWRITE_BLOCK_SIZE – BATCH_SIZE
= 128 – 120
= 8
• buf/buf0dblwr.cc: buf_dblwr_write_single_page()
Single Page Flush
1/31/2017 126
If all slots are reserved,
wait until current
dblwr done
Find a free slot
• buf/buf0dblwr.cc: buf_dblwr_write_single_page()
Single Page Flush
1/31/2017 127
Write to datafile
(synchronous)
Sync system tablespace
for dwb area in disk
Write block to
dwb area in disk
(synchronous)
...
• buf/buf0dblwr.cc: buf_dblwr_write_block_to_datafile()
Single Page Flush
1/31/2017 128
Issue write operation
to datafile
In single page flush,
sync = true
• buf/buf0flu.cc: buf_flush_write_block_low()
Single Page Flush
1/31/2017 129
Sync buffered write to disk;
call fsync by fil_flush()
Single Page Flush
• buf/buf0buf.cc: buf_page_io_complete()
1/31/2017 130
Get io type (In this case, BUF_IO_WRITE)
Single Page Flush
• buf/buf0buf.cc: buf_page_io_complete()
1/31/2017 131
Set io fix to BUF_IO_NONE
Single Page Flush
• buf/buf0buf.cc: buf_page_io_complete()
1/31/2017 132
Single Page Flush
• buf/buf0flu.cc: buf_flush_write_complete()
1/31/2017 133
Remove the block
from the flush list
…
Single Page Flush
• buf/buf0dblwr.cc: buf_dblwr_update()
1/31/2017 134
Free the dwb slot
of the target page
Single Page Flush
• Single page flush is performed in the context of the query
thread itself
• Single page flush mode iterates over the LRU list of a buffer
pool instance, while holding the buffer pool mutex
• It might have trouble in getting a free doublewrite buffer slot
(total 8 slots)
• In result, it makes the overall performance worse
1/31/2017 135
DOUBLEWRITE BUFFER
PART 3 : BATCH FLUSH
1/31/2017 136
• buf/buf0flu.cc: buf_flush_write_block_low()
Batch Flush
1/31/2017 137
• buf/buf0dblwr.cc: buf_dblwr_add_to_batch()
Batch Flush
1/31/2017 138
If another batch is already
running, wait until done
• buf/buf0dblwr.cc: buf_dblwr_add_to_batch()
Batch Flush
1/31/2017 139
If all slots for batch flush
in dwb buffer is reserved,
flush dwb buffer
• buf/buf0dblwr.cc: buf_dblwr_add_to_batch()
Batch Flush
1/31/2017 140
After flushing, copy current
block to buf_dblwr->first_free
• buf/buf0dblwr.cc: buf_dblwr_flush_buffered_writes()
Batch Flush
1/31/2017 141
Doublewrite off case
• buf/buf0dblwr.cc: buf_dblwr_sync_datafiles()
Batch Flush
1/31/2017 142
 fil_flush()
 os_file_flush()
 os_file_fsync()
 fsync()
• buf/buf0dblwr.cc: buf_dblwr_flush_buffered_writes()
Batch Flush
1/31/2017 143
Change batch running status
Exit mutex
Nobody won’t be
here except me!
• buf/buf0dblwr.cc: buf_dblwr_flush_buffered_writes()
Batch Flush
1/31/2017 144
Issue write op for block1
(synchronous)
If current write uses only block1,
then flush
• buf/buf0dblwr.cc: buf_dblwr_flush_buffered_writes()
Batch Flush
1/31/2017 145
Issue write op for block2
(synchronous)
...
• buf/buf0dblwr.cc: buf_dblwr_flush_buffered_writes()
Batch Flush
1/31/2017 146
Flush (fsync) system table space
Write all blocks to datafile
(asynchronous)
• After submitting aio requests,
– fil_aio_wait()
– buf_page_io_complete()
– buf_flush_write_complete()
– buf_dblwr_update()
Batch Flush
1/31/2017 147
• buf/buf0dblwr.cc: buf_dblwr_update()
Batch Flush
1/31/2017 148
Flush datafile
Reset dwb
SYNCHRONIZATION
1/31/2017 149
InnoDB Synchronization
• InnoDB implements its own mutexes & RW-locks for buffer
management
• Latch in InnoDB
– A lightweight structure used by InnoDB to implement a lock
– Typically held for a brief time (milliseconds or microseconds)
– A general term that includes both mutexes (for exclusive access) and
rw-locks (for shared access)
1/31/2017 150
Mutex in InnoDB
• The low-level object to represent and enforce exclusive-
access locks to internal in-memory data structures
• Once the lock is acquired, any other process, thread, and so
on is prevented from acquiring the same lock
1/31/2017 151
Mutex in InnoDB
1/31/2017 152
• Example code in InnoDB
– buf/buf0buf.cc: buf_wait_for_read()
Get current IO fix of the block
Mutex in InnoDB
1/31/2017 153
• Example code in InnoDB
– buf/buf0flu.cc: buf_flush_batch()
Flush the pages in
the buffer pool
RW-lock in InnoDB
• The low-level object to represent and enforce shared-access
locks to internal in-memory data structures
• RW-lock includes three types of locks
– S-locks (shared locks)
– X-locks (exclusive locks)
– SX-locks (shared-exclusive locks)
1/31/2017 154
S SX X
S Compatible Compatible Conflict
SX Compatible Conflict Conflict
X Conflict Conflict Conflict
RW-lock in InnoDB
• S-lock (Shared-lock)
– provides read access to a common resource
• X-lock (eXclusive-lock)
– provides write access to a common resource
– while not permitting inconsistent reads by other threads
• SX-lock (Shared-eXclusive lock)
– provides write access to a common resource
– while permitting inconsistent reads by other threads
– introduced in MySQL 5.7 to optimize concurrency and improve
scalability for read-write workloads.
1/31/2017 155
RW-lock in InnoDB
1/31/2017 156
• Example code in InnoDB (S-lock)
– buf/buf0buf.cc: buf_page_get_gen()
…
Search hash table
RW-lock in InnoDB
1/31/2017 157
• Example code in InnoDB (X-lock)
– buf/buf0buf.cc: buf_page_init_for_read()
…
Insert a page into
the hash table

More Related Content

What's hot

Dd and atomic ddl pl17 dublin
Dd and atomic ddl pl17 dublinDd and atomic ddl pl17 dublin
Dd and atomic ddl pl17 dublin
Ståle Deraas
 
The InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLThe InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLMorgan Tocker
 
binary log と 2PC と Group Commit
binary log と 2PC と Group Commitbinary log と 2PC と Group Commit
binary log と 2PC と Group Commit
Takanori Sejima
 
PostgreSQL and RAM usage
PostgreSQL and RAM usagePostgreSQL and RAM usage
PostgreSQL and RAM usage
Alexey Bashtanov
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
MIJIN AN
 
Introduction to Vacuum Freezing and XID
Introduction to Vacuum Freezing and XIDIntroduction to Vacuum Freezing and XID
Introduction to Vacuum Freezing and XID
PGConf APAC
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
PGConf APAC
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
Yoshinori Matsunobu
 
Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0
Mydbops
 
Innodb에서의 Purge 메커니즘 deep internal (by 이근오)
Innodb에서의 Purge 메커니즘 deep internal (by  이근오)Innodb에서의 Purge 메커니즘 deep internal (by  이근오)
Innodb에서의 Purge 메커니즘 deep internal (by 이근오)
I Goo Lee.
 
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...
Adrian Huang
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
Tomas Vondra
 
Understanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStoreUnderstanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStore
MariaDB plc
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxData
 
MySQL innoDB split and merge pages
MySQL innoDB split and merge pagesMySQL innoDB split and merge pages
MySQL innoDB split and merge pages
Marco Tusa
 
ProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQLProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQL
René Cannaò
 
さいきんの InnoDB Adaptive Flushing (仮)
さいきんの InnoDB Adaptive Flushing (仮)さいきんの InnoDB Adaptive Flushing (仮)
さいきんの InnoDB Adaptive Flushing (仮)
Takanori Sejima
 
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataProblems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Jignesh Shah
 
Memory Management with Page Folios
Memory Management with Page FoliosMemory Management with Page Folios
Memory Management with Page Folios
Adrian Huang
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uring
ShapeBlue
 

What's hot (20)

Dd and atomic ddl pl17 dublin
Dd and atomic ddl pl17 dublinDd and atomic ddl pl17 dublin
Dd and atomic ddl pl17 dublin
 
The InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLThe InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQL
 
binary log と 2PC と Group Commit
binary log と 2PC と Group Commitbinary log と 2PC と Group Commit
binary log と 2PC と Group Commit
 
PostgreSQL and RAM usage
PostgreSQL and RAM usagePostgreSQL and RAM usage
PostgreSQL and RAM usage
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 
Introduction to Vacuum Freezing and XID
Introduction to Vacuum Freezing and XIDIntroduction to Vacuum Freezing and XID
Introduction to Vacuum Freezing and XID
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0
 
Innodb에서의 Purge 메커니즘 deep internal (by 이근오)
Innodb에서의 Purge 메커니즘 deep internal (by  이근오)Innodb에서의 Purge 메커니즘 deep internal (by  이근오)
Innodb에서의 Purge 메커니즘 deep internal (by 이근오)
 
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
 
Understanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStoreUnderstanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStore
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
MySQL innoDB split and merge pages
MySQL innoDB split and merge pagesMySQL innoDB split and merge pages
MySQL innoDB split and merge pages
 
ProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQLProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQL
 
さいきんの InnoDB Adaptive Flushing (仮)
さいきんの InnoDB Adaptive Flushing (仮)さいきんの InnoDB Adaptive Flushing (仮)
さいきんの InnoDB Adaptive Flushing (仮)
 
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataProblems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
 
Memory Management with Page Folios
Memory Management with Page FoliosMemory Management with Page Folios
Memory Management with Page Folios
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uring
 

Similar to MySQL Buffer Management

Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernel
Adrian Huang
 
Os Linux Documentation
Os Linux DocumentationOs Linux Documentation
Os Linux Documentation
Federal Urdu University
 
Percona Server 5.7: Key Performance Algorithms
Percona Server 5.7: Key Performance AlgorithmsPercona Server 5.7: Key Performance Algorithms
Percona Server 5.7: Key Performance Algorithms
Laurynas Biveinis
 
cache memory
 cache memory cache memory
Play with FILE Structure - Yet Another Binary Exploit Technique
Play with FILE Structure - Yet Another Binary Exploit TechniquePlay with FILE Structure - Yet Another Binary Exploit Technique
Play with FILE Structure - Yet Another Binary Exploit Technique
Angel Boy
 
Managing Memory & Locks - Series 1 Memory Management
Managing  Memory & Locks - Series 1 Memory ManagementManaging  Memory & Locks - Series 1 Memory Management
Managing Memory & Locks - Series 1 Memory Management
DAGEOP LTD
 
Report blocking ,management of files in secondry memory , static vs dynamic a...
Report blocking ,management of files in secondry memory , static vs dynamic a...Report blocking ,management of files in secondry memory , static vs dynamic a...
Report blocking ,management of files in secondry memory , static vs dynamic a...
NoorMustafaSoomro
 
AOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksAOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksZubair Nabi
 
Incremental backups
Incremental backupsIncremental backups
Incremental backups
Vlad Lesin
 
chapter5-file system implementation.ppt
chapter5-file system implementation.pptchapter5-file system implementation.ppt
chapter5-file system implementation.ppt
BUSHRASHAIKH804312
 
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioSpeed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with Alluxio
Alluxio, Inc.
 

Similar to MySQL Buffer Management (14)

Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernel
 
Os Linux Documentation
Os Linux DocumentationOs Linux Documentation
Os Linux Documentation
 
Percona Server 5.7: Key Performance Algorithms
Percona Server 5.7: Key Performance AlgorithmsPercona Server 5.7: Key Performance Algorithms
Percona Server 5.7: Key Performance Algorithms
 
cache memory
 cache memory cache memory
cache memory
 
Updates
UpdatesUpdates
Updates
 
Updates
UpdatesUpdates
Updates
 
Play with FILE Structure - Yet Another Binary Exploit Technique
Play with FILE Structure - Yet Another Binary Exploit TechniquePlay with FILE Structure - Yet Another Binary Exploit Technique
Play with FILE Structure - Yet Another Binary Exploit Technique
 
Managing Memory & Locks - Series 1 Memory Management
Managing  Memory & Locks - Series 1 Memory ManagementManaging  Memory & Locks - Series 1 Memory Management
Managing Memory & Locks - Series 1 Memory Management
 
kerch04.ppt
kerch04.pptkerch04.ppt
kerch04.ppt
 
Report blocking ,management of files in secondry memory , static vs dynamic a...
Report blocking ,management of files in secondry memory , static vs dynamic a...Report blocking ,management of files in secondry memory , static vs dynamic a...
Report blocking ,management of files in secondry memory , static vs dynamic a...
 
AOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksAOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocks
 
Incremental backups
Incremental backupsIncremental backups
Incremental backups
 
chapter5-file system implementation.ppt
chapter5-file system implementation.pptchapter5-file system implementation.ppt
chapter5-file system implementation.ppt
 
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioSpeed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with Alluxio
 

Recently uploaded

Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaTop 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Yara Milbes
 

Recently uploaded (20)

Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaTop 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
 

MySQL Buffer Management