Incremental backups
available with Xtrabackup and
Percona Server
Vladislav Lesin
George Lorch
16 April 2015
What is incremental backup?
• Full backup
– Save consistent state of data at
some point every time on backup
• Incremental backup
– Save consistent state of data at some point once (base)
– Get delta between two consistent states
– Get new state by applying delta to the previously saved state
(considering only innodb)
2
Pros and Cons
• Pros
– delta can take less space than the entire
data
• Cons
–overhead on delta forming
3
The ways of getting delta
• Full scan
• Use innodb redo logs
• Log changed page ids
• any new ideas, questions...
4
Full scan
• Each page contains LSN of last update
(Log Sequence Number –
the number of bytes written to redo log
before certain log record)
• Read database page by page and copy pages
newer then specified LSN (base LSN)
• Copy redo log file from the last checkpoint start LSN
(checkpoint is a process of synchronization redo log
with innodb pages)
5
Full scan 6
LSN = 950
LSN = 960LSN = 960
LSN = 1002
LSN = 1003
LSN = 940
LSN = 1710
Table pages
LSN
>
1000
?
read
read
read
read
read
read
write
write
write
Delta
LSN = 950
LSN = 960LSN = 960
LSN = 1002
LSN = 1003
LSN = 940
LSN = 1710
Last
checkpoint
start LSN
Redo log
Page N
Change
page N
record
Apply delta for full scan 7
Delta
Redo log
LSN = 950
LSN = 960LSN = 960
LSN = 1002
LSN = 1003
LSN = 940
LSN = 1710
LSN = 950
LSN = 960LSN = 960
LSN = 1002
LSN = 1003
LSN = 940
LSN = 1710
Table pages
write
write
write
Page N
Last
checkpoint
start LSN
Change
page N
record
Apply as
LSN > 1003
LSN = 1003New LSN
Full scan pros and cons
• Pros
– Store only changed pages
– No server code changes are required
• Cons
–Full scan overhead
Can we avoid full scan?
8
The ways of getting delta
• Full scan
• Use innodb redo logs
• Log changed page ids
• any new ideas...
9
Redo log as delta
• Operations which change pages are logged in redo log
• Redo log is used for recovering on unexpected server
termination, in the case when changed pages were not
flushed before termination log records are applied to
those pages
• In other words redo log describes data changes since
some point
• Why do not store redo logs somewhere and apply them
to some base to get new base?
10
Redo log structure
• Several files of the same size (the size
and files number are configurable)
• Circular buffer
• Different flushing politics (see
innodb_flush_log_at_trx_commit)
• Checkpoints
11
Redo log archiving 12
In-memory global redo log buffer In-memory archived log buffer
Flush corresponding
to flush options
Redo log
Async write
Redo log Redo log
archive
Read
Redo log archiving 13
• Log records are buffered, the buffer is flushed due to
flushing options
• Redo log records are archived when:
–log buffer is full
–checkpoint
–the difference between redo log LSN and archived log
LSN is too high (sync write, i.e. block any writes to log
until one is archived)
–other edge cases (server shutdown etc.)
Redo log archiving overhead
• Double write of redo log records
• Read redo log records
• Synchronous write if the lag between redo log and
archived log is too big
• Archived log files can take a lot of disk space
14
Archived log files
• Stored in certain directory
• Have the same size as redo log files
• Contain start LSN in file name
• Are not removed automatically
15
Archived logs applying 16
Archived logs
read
...
Log blocks buffer
...
parse
Hash table with
f(space id, table id)
as a key and a list
of log records as a
value
LSN = 950
LSN = 960LSN = 960
LSN = 1002
LSN = 1003
LSN = 940
LSN = 1710
Buffer pool
LSN = 950
LSN = 960LSN = 960
LSN = 1002
LSN = 1003
LSN = 940
LSN = 1710
Apply if log
record LSN >
page LSNIs page
buffered?
yes
no
Push
async
IO
request
...
...
...
...
...
IO slots
IO thread
IO thread
Apply if log
record LSN >
page LSN
Replication vs logs archiving
• Redo logs applying and binary logs applying are
similar process, but redo logs records are at physical
level while binlog events are on logical level
• Redo logs applying is fasted because there is no
overhead for converting logical to physical operations
• Binlog archiving could be more universal as binlog
records describe changes on logical level and can be
applied to different db engine(MyISAM for example)
17
Logs archiving pros and cons
• Pros
– No full scan
– Point in time backups
• Cons
–Big archived logs size
–Double log write overhead
(not so big as writes are sequential)
Can we avoid full scan and redo logs archiving?
18
The ways of getting delta
• Full scan
• Use innodb redo logs
• Log changed page ids
• any new ideas...
19
Combine two approaches 20
• Two ways to get the modification LSN of a
page:
– It is written on the page, - or -
– We can figure it out from the redo log
• The log is cyclical, we must, in the server,
save the info before it is overwritten
Server side implementation
• Separate thread in server waits for checkpoint end
• Reads redo log from the last tracked position
• Parse redo log records and mark pages as modified in special data structure
– RB-tree
– The key is (space_id, page_id) pair
– The nodes are bitmap blocks of fixed size
– Bitmap block consist of overhead information and bitmap itself
– If bit is set the corresponding page was modified
– Each block contains information about start and end LSNs between which
the changes in pages took place
21
The bitmap example 22
Header
(space:5, page:N*8)
crc
pad
bitmap of N pages
Header
(space:3, page:N*8)
crc
pad
bitmap of N pages
Header
(space:5, page:N*9)
crc
pad
bitmap of N pages
Node
Left
Right
Bitmap file format 23
Data for checkpoint at LSN 9000
LSN 10000
LSN 10500
A sequence of per-checkpoint varying number of data pages:
For each checkpoint:
space, start page space, start page space, start page
4KB
Each page contains a bitmap for the next 32480 pages in space starting
from start page
Bitmap File Naming & Sizing 24
• ib_modified_log_<seq>_<LSN>.xdb
– <Seq>: 1, 2, 3, ...
– <LSN>: the server LSN at the file create time
• Rotated on
–Server start
–innodb_max_bitmap_file_size
How delta is formed? 25
LSN = 950
LSN = 960LSN = 960
LSN = 1002
LSN = 1003
LSN = 940
LSN = 1010
table.ibd
LSN
>
1000
?
Base Backup
LSN = 1000
read
read
read
write
write
write
Table.ibd.delta
Percona
Server
…
Changed pages between
LSNs 980 and 1020:
1002, 1003, 1010
...
The general advantage
Only modified pages are read during
delta forming
26
Backup performance 27
0.00% 0.01% 1.00% 100.00%
0%
20%
40%
60%
80%
100%
Full Scan
Bitmap
Delta Size
BackupTime
Size overhead 28
1 2 3 4 5 6 7 8
0
100
200
300
400
500
600
700
800
Log and bitmap file size comparison
Bitmap file #
Logbytes/bitmapbyte
• A good case: > 100 log bytes for 1 bmp byte
Size overhead 29
• A bad case: 3-15 log bytes per 1 bmp byte
• https://bugs.launchpad.net/bugs/1269547
– We are considering fix options
Tracking: server overhead 30
• Impact on TPS and response time:
– Couldn't find it
– If you ever do find it, report it to us and
try
--innodb_log_checksum_algorithm=crc32
●
http://bit.ly/pslogcrc32
Bitmap files management 31
• PURGE CHANGED_PAGE_BITMAPS BEFORE <lsn>
– ib_1_8192.xdb
– ib_2_10000.xdb
– ib_3_20000.xdb
– Full backup taken, LSN = 22000
– PURGE C_P_B BEFORE 22000;
– ib_4_30000.xdb
– Incremental backup taken, LSN = 33000
– PURGE C_P_B BEFORE 33000;
INFORMATION_SCHEMA.INNODB_CHANGED_PAGES 32
• Percona Server can read the bitmaps too
SHOW CREATE TABLE INFORMATION_SCHEMA.INNODB_CHANGED_PAGES;
CREATE TABLE `INNODB_CHANGED_PAGES` (
`space_id` int(11) unsigned NOT NULL DEFAULT '0',
`page_id` int(11) unsigned NOT NULL DEFAULT '0',
`start_lsn` bigint(21) unsigned NOT NULL DEFAULT '0',
`end_lsn` bigint(21) unsigned NOT NULL DEFAULT '0'
)
• start_lsn and end_lsn are always at the checkpoint boundary
• Does not show the exact LSN of a change
• Does not show the number of changes for one page
• Does show the number of flushes for a page over the workload
The ways of getting delta
• Full scan
• Use storage redo logs
• Log changed page ids
• any new ideas, questions...
(Thanks to Laurynas Biveinis for bitmap part)
33

Incremental backups

  • 1.
    Incremental backups available withXtrabackup and Percona Server Vladislav Lesin George Lorch 16 April 2015
  • 2.
    What is incrementalbackup? • Full backup – Save consistent state of data at some point every time on backup • Incremental backup – Save consistent state of data at some point once (base) – Get delta between two consistent states – Get new state by applying delta to the previously saved state (considering only innodb) 2
  • 3.
    Pros and Cons •Pros – delta can take less space than the entire data • Cons –overhead on delta forming 3
  • 4.
    The ways ofgetting delta • Full scan • Use innodb redo logs • Log changed page ids • any new ideas, questions... 4
  • 5.
    Full scan • Eachpage contains LSN of last update (Log Sequence Number – the number of bytes written to redo log before certain log record) • Read database page by page and copy pages newer then specified LSN (base LSN) • Copy redo log file from the last checkpoint start LSN (checkpoint is a process of synchronization redo log with innodb pages) 5
  • 6.
    Full scan 6 LSN= 950 LSN = 960LSN = 960 LSN = 1002 LSN = 1003 LSN = 940 LSN = 1710 Table pages LSN > 1000 ? read read read read read read write write write Delta LSN = 950 LSN = 960LSN = 960 LSN = 1002 LSN = 1003 LSN = 940 LSN = 1710 Last checkpoint start LSN Redo log Page N Change page N record
  • 7.
    Apply delta forfull scan 7 Delta Redo log LSN = 950 LSN = 960LSN = 960 LSN = 1002 LSN = 1003 LSN = 940 LSN = 1710 LSN = 950 LSN = 960LSN = 960 LSN = 1002 LSN = 1003 LSN = 940 LSN = 1710 Table pages write write write Page N Last checkpoint start LSN Change page N record Apply as LSN > 1003 LSN = 1003New LSN
  • 8.
    Full scan prosand cons • Pros – Store only changed pages – No server code changes are required • Cons –Full scan overhead Can we avoid full scan? 8
  • 9.
    The ways ofgetting delta • Full scan • Use innodb redo logs • Log changed page ids • any new ideas... 9
  • 10.
    Redo log asdelta • Operations which change pages are logged in redo log • Redo log is used for recovering on unexpected server termination, in the case when changed pages were not flushed before termination log records are applied to those pages • In other words redo log describes data changes since some point • Why do not store redo logs somewhere and apply them to some base to get new base? 10
  • 11.
    Redo log structure •Several files of the same size (the size and files number are configurable) • Circular buffer • Different flushing politics (see innodb_flush_log_at_trx_commit) • Checkpoints 11
  • 12.
    Redo log archiving12 In-memory global redo log buffer In-memory archived log buffer Flush corresponding to flush options Redo log Async write Redo log Redo log archive Read
  • 13.
    Redo log archiving13 • Log records are buffered, the buffer is flushed due to flushing options • Redo log records are archived when: –log buffer is full –checkpoint –the difference between redo log LSN and archived log LSN is too high (sync write, i.e. block any writes to log until one is archived) –other edge cases (server shutdown etc.)
  • 14.
    Redo log archivingoverhead • Double write of redo log records • Read redo log records • Synchronous write if the lag between redo log and archived log is too big • Archived log files can take a lot of disk space 14
  • 15.
    Archived log files •Stored in certain directory • Have the same size as redo log files • Contain start LSN in file name • Are not removed automatically 15
  • 16.
    Archived logs applying16 Archived logs read ... Log blocks buffer ... parse Hash table with f(space id, table id) as a key and a list of log records as a value LSN = 950 LSN = 960LSN = 960 LSN = 1002 LSN = 1003 LSN = 940 LSN = 1710 Buffer pool LSN = 950 LSN = 960LSN = 960 LSN = 1002 LSN = 1003 LSN = 940 LSN = 1710 Apply if log record LSN > page LSNIs page buffered? yes no Push async IO request ... ... ... ... ... IO slots IO thread IO thread Apply if log record LSN > page LSN
  • 17.
    Replication vs logsarchiving • Redo logs applying and binary logs applying are similar process, but redo logs records are at physical level while binlog events are on logical level • Redo logs applying is fasted because there is no overhead for converting logical to physical operations • Binlog archiving could be more universal as binlog records describe changes on logical level and can be applied to different db engine(MyISAM for example) 17
  • 18.
    Logs archiving prosand cons • Pros – No full scan – Point in time backups • Cons –Big archived logs size –Double log write overhead (not so big as writes are sequential) Can we avoid full scan and redo logs archiving? 18
  • 19.
    The ways ofgetting delta • Full scan • Use innodb redo logs • Log changed page ids • any new ideas... 19
  • 20.
    Combine two approaches20 • Two ways to get the modification LSN of a page: – It is written on the page, - or - – We can figure it out from the redo log • The log is cyclical, we must, in the server, save the info before it is overwritten
  • 21.
    Server side implementation •Separate thread in server waits for checkpoint end • Reads redo log from the last tracked position • Parse redo log records and mark pages as modified in special data structure – RB-tree – The key is (space_id, page_id) pair – The nodes are bitmap blocks of fixed size – Bitmap block consist of overhead information and bitmap itself – If bit is set the corresponding page was modified – Each block contains information about start and end LSNs between which the changes in pages took place 21
  • 22.
    The bitmap example22 Header (space:5, page:N*8) crc pad bitmap of N pages Header (space:3, page:N*8) crc pad bitmap of N pages Header (space:5, page:N*9) crc pad bitmap of N pages Node Left Right
  • 23.
    Bitmap file format23 Data for checkpoint at LSN 9000 LSN 10000 LSN 10500 A sequence of per-checkpoint varying number of data pages: For each checkpoint: space, start page space, start page space, start page 4KB Each page contains a bitmap for the next 32480 pages in space starting from start page
  • 24.
    Bitmap File Naming& Sizing 24 • ib_modified_log_<seq>_<LSN>.xdb – <Seq>: 1, 2, 3, ... – <LSN>: the server LSN at the file create time • Rotated on –Server start –innodb_max_bitmap_file_size
  • 25.
    How delta isformed? 25 LSN = 950 LSN = 960LSN = 960 LSN = 1002 LSN = 1003 LSN = 940 LSN = 1010 table.ibd LSN > 1000 ? Base Backup LSN = 1000 read read read write write write Table.ibd.delta Percona Server … Changed pages between LSNs 980 and 1020: 1002, 1003, 1010 ...
  • 26.
    The general advantage Onlymodified pages are read during delta forming 26
  • 27.
    Backup performance 27 0.00%0.01% 1.00% 100.00% 0% 20% 40% 60% 80% 100% Full Scan Bitmap Delta Size BackupTime
  • 28.
    Size overhead 28 12 3 4 5 6 7 8 0 100 200 300 400 500 600 700 800 Log and bitmap file size comparison Bitmap file # Logbytes/bitmapbyte • A good case: > 100 log bytes for 1 bmp byte
  • 29.
    Size overhead 29 •A bad case: 3-15 log bytes per 1 bmp byte • https://bugs.launchpad.net/bugs/1269547 – We are considering fix options
  • 30.
    Tracking: server overhead30 • Impact on TPS and response time: – Couldn't find it – If you ever do find it, report it to us and try --innodb_log_checksum_algorithm=crc32 ● http://bit.ly/pslogcrc32
  • 31.
    Bitmap files management31 • PURGE CHANGED_PAGE_BITMAPS BEFORE <lsn> – ib_1_8192.xdb – ib_2_10000.xdb – ib_3_20000.xdb – Full backup taken, LSN = 22000 – PURGE C_P_B BEFORE 22000; – ib_4_30000.xdb – Incremental backup taken, LSN = 33000 – PURGE C_P_B BEFORE 33000;
  • 32.
    INFORMATION_SCHEMA.INNODB_CHANGED_PAGES 32 • PerconaServer can read the bitmaps too SHOW CREATE TABLE INFORMATION_SCHEMA.INNODB_CHANGED_PAGES; CREATE TABLE `INNODB_CHANGED_PAGES` ( `space_id` int(11) unsigned NOT NULL DEFAULT '0', `page_id` int(11) unsigned NOT NULL DEFAULT '0', `start_lsn` bigint(21) unsigned NOT NULL DEFAULT '0', `end_lsn` bigint(21) unsigned NOT NULL DEFAULT '0' ) • start_lsn and end_lsn are always at the checkpoint boundary • Does not show the exact LSN of a change • Does not show the number of changes for one page • Does show the number of flushes for a page over the workload
  • 33.
    The ways ofgetting delta • Full scan • Use storage redo logs • Log changed page ids • any new ideas, questions... (Thanks to Laurynas Biveinis for bitmap part) 33