Incremental backups

Incremental backups
available with Xtrabackup and
Percona Server
Vladislav Lesin
George Lorch
16 April 2015

What is incremental backup?
• Full backup
– Save consistent state of data at
some point every time on backup
• Incremental backup
– Save consistent state of data at some point once (base)
– Get delta between two consistent states
– Get new state by applying delta to the previously saved state
(considering only innodb)
2

Pros and Cons
• Pros
– delta can take less space than the entire
data
• Cons
–overhead on delta forming
3

The ways of getting delta
• Full scan
• Use innodb redo logs
• Log changed page ids
• any new ideas, questions...
4

Full scan
• Each page contains LSN of last update
(Log Sequence Number –
the number of bytes written to redo log
before certain log record)
• Read database page by page and copy pages
newer then specified LSN (base LSN)
• Copy redo log file from the last checkpoint start LSN
(checkpoint is a process of synchronization redo log
with innodb pages)
5

Full scan 6
LSN = 950
LSN = 960LSN = 960
LSN = 1002
LSN = 1003
LSN = 940
LSN = 1710
Table pages
LSN
>
1000
?
read
read
read
read
read
read
write
write
write
Delta
LSN = 950
LSN = 960LSN = 960
LSN = 1002
LSN = 1003
LSN = 940
LSN = 1710
Last
checkpoint
start LSN
Redo log
Page N
Change
page N
record

Apply delta for full scan 7
Delta
Redo log
LSN = 950
LSN = 960LSN = 960
LSN = 1002
LSN = 1003
LSN = 940
LSN = 1710
LSN = 950
LSN = 960LSN = 960
LSN = 1002
LSN = 1003
LSN = 940
LSN = 1710
Table pages
write
write
write
Page N
Last
checkpoint
start LSN
Change
page N
record
Apply as
LSN > 1003
LSN = 1003New LSN

Full scan pros and cons
• Pros
– Store only changed pages
– No server code changes are required
• Cons
–Full scan overhead
Can we avoid full scan?
8

• Full scan
• any new ideas...
9

Redo log as delta
• Operations which change pages are logged in redo log
• Redo log is used for recovering on unexpected server
termination, in the case when changed pages were not
flushed before termination log records are applied to
those pages
• In other words redo log describes data changes since
some point
• Why do not store redo logs somewhere and apply them
to some base to get new base?
10

Redo log structure
• Several files of the same size (the size
and files number are configurable)
• Circular buffer
• Different flushing politics (see
innodb_flush_log_at_trx_commit)
• Checkpoints
11

Redo log archiving 12
In-memory global redo log buffer In-memory archived log buffer
Flush corresponding
to flush options
Redo log
Async write
Redo log Redo log
archive
Read

Redo log archiving 13
• Log records are buffered, the buffer is flushed due to
flushing options
• Redo log records are archived when:
–log buffer is full
–checkpoint
–the difference between redo log LSN and archived log
LSN is too high (sync write, i.e. block any writes to log
until one is archived)
–other edge cases (server shutdown etc.)

Redo log archiving overhead
• Double write of redo log records
• Read redo log records
• Synchronous write if the lag between redo log and
archived log is too big
• Archived log files can take a lot of disk space
14

Archived log files
• Stored in certain directory
• Have the same size as redo log files
• Contain start LSN in file name
• Are not removed automatically
15

Archived logs applying 16
Archived logs
read
...
Log blocks buffer
...
parse
Hash table with
f(space id, table id)
as a key and a list
of log records as a
value
LSN = 950
LSN = 960LSN = 960
LSN = 1002
LSN = 1003
LSN = 940
LSN = 1710
Buffer pool
LSN = 950
LSN = 960LSN = 960
LSN = 1002
LSN = 1003
LSN = 940
LSN = 1710
Apply if log
record LSN >
page LSNIs page
buffered?
yes
no
Push
async
IO
request
...
...
...
...
...
IO slots
IO thread
IO thread
Apply if log
record LSN >
page LSN

Replication vs logs archiving
• Redo logs applying and binary logs applying are
similar process, but redo logs records are at physical
level while binlog events are on logical level
• Redo logs applying is fasted because there is no
overhead for converting logical to physical operations
• Binlog archiving could be more universal as binlog
records describe changes on logical level and can be
applied to different db engine(MyISAM for example)
17

Logs archiving pros and cons
• Pros
– No full scan
– Point in time backups
• Cons
–Big archived logs size
–Double log write overhead
(not so big as writes are sequential)
Can we avoid full scan and redo logs archiving?
18

• Full scan
• any new ideas...
19

Combine two approaches 20
• Two ways to get the modification LSN of a
page:
– It is written on the page, - or -
– We can figure it out from the redo log
• The log is cyclical, we must, in the server,
save the info before it is overwritten

Server side implementation
• Separate thread in server waits for checkpoint end
• Reads redo log from the last tracked position
• Parse redo log records and mark pages as modified in special data structure
– RB-tree
– The key is (space_id, page_id) pair
– The nodes are bitmap blocks of fixed size
– Bitmap block consist of overhead information and bitmap itself
– If bit is set the corresponding page was modified
– Each block contains information about start and end LSNs between which
the changes in pages took place
21

The bitmap example 22
Header
(space:5, page:N*8)
crc
pad
bitmap of N pages
Header
(space:3, page:N*8)
crc
pad
bitmap of N pages
Header
(space:5, page:N*9)
crc
pad
bitmap of N pages
Node
Left
Right

Bitmap file format 23
Data for checkpoint at LSN 9000
LSN 10000
LSN 10500
A sequence of per-checkpoint varying number of data pages:
For each checkpoint:
space, start page space, start page space, start page
4KB
Each page contains a bitmap for the next 32480 pages in space starting
from start page

Bitmap File Naming & Sizing 24
• ib_modified_log_<seq>_<LSN>.xdb
– <Seq>: 1, 2, 3, ...
– <LSN>: the server LSN at the file create time
• Rotated on
–Server start
–innodb_max_bitmap_file_size

How delta is formed? 25
LSN = 950
LSN = 960LSN = 960
LSN = 1002
LSN = 1003
LSN = 940
LSN = 1010
table.ibd
LSN
>
1000
?
Base Backup
LSN = 1000
read
read
read
write
write
write
Table.ibd.delta
Percona
Server
…
Changed pages between
LSNs 980 and 1020:
1002, 1003, 1010
...

The general advantage
Only modified pages are read during
delta forming
26

Backup performance 27
0.00% 0.01% 1.00% 100.00%
0%
20%
40%
60%
80%
100%
Full Scan
Bitmap
Delta Size
BackupTime

Size overhead 28
1 2 3 4 5 6 7 8
0
100
200
300
400
500
600
700
800
Log and bitmap file size comparison
Bitmap file #
Logbytes/bitmapbyte
• A good case: > 100 log bytes for 1 bmp byte

Size overhead 29
• A bad case: 3-15 log bytes per 1 bmp byte
• https://bugs.launchpad.net/bugs/1269547
– We are considering fix options

Tracking: server overhead 30
• Impact on TPS and response time:
– Couldn't find it
– If you ever do find it, report it to us and
try
--innodb_log_checksum_algorithm=crc32
●
http://bit.ly/pslogcrc32

Bitmap files management 31
• PURGE CHANGED_PAGE_BITMAPS BEFORE <lsn>
– ib_1_8192.xdb
– ib_2_10000.xdb
– ib_3_20000.xdb
– Full backup taken, LSN = 22000
– PURGE C_P_B BEFORE 22000;
– ib_4_30000.xdb
– Incremental backup taken, LSN = 33000
– PURGE C_P_B BEFORE 33000;

INFORMATION_SCHEMA.INNODB_CHANGED_PAGES 32
• Percona Server can read the bitmaps too
SHOW CREATE TABLE INFORMATION_SCHEMA.INNODB_CHANGED_PAGES;
CREATE TABLE `INNODB_CHANGED_PAGES` (
`space_id` int(11) unsigned NOT NULL DEFAULT '0',
`page_id` int(11) unsigned NOT NULL DEFAULT '0',
`start_lsn` bigint(21) unsigned NOT NULL DEFAULT '0',
`end_lsn` bigint(21) unsigned NOT NULL DEFAULT '0'
)
• start_lsn and end_lsn are always at the checkpoint boundary
• Does not show the exact LSN of a change
• Does not show the number of changes for one page
• Does show the number of flushes for a page over the workload

• Full scan
• Use storage redo logs
• any new ideas, questions...
(Thanks to Laurynas Biveinis for bitmap part)
33

Incremental backups

More Related Content

What's hot

Similar to Incremental backups

Recently uploaded

Incremental backups