This document discusses tracking page changes in a database to enable faster and more frequent incremental backups. It presents performance results showing that incremental backups using changed page tracking are faster than full scans. It describes how the changed page tracking works in Percona Server and XtraBackup, including the new INFORMATION_SCHEMA.INNODB_CHANGED_PAGES table. Server overhead is minimal. The implementation writes bitmaps for changed pages after each log checkpoint. This allows avoiding reading unchanged pages during backups.
2. 2
Agenda
• Incremental XtraBackup: performance
• Incremental XtraBackup with bitmaps:
performance
• Server overhead
• INFORMATION_SCHEMA.INNODB_CHANGED_PAGES
• Implementation
– Bitmap file format
– New server thread
3. 3
Incremental XtraBackup:
Performance
• Does time to backup depend on the % of changed data?
100%
90%
80%
70%
Backup Time
60%
50%
40%
30%
20%
10%
0%
0.10% 1.00% 10.00% 100.00%
Delta Size
4. 4
Incremental XtraBackup: How Data
Page Copying Works
Can we avoid reading
the old pages? Base Backup
LSN = 1000
table.ibd
LSN = 950 read
LSN = 960 read LSN
LSN = 1002 read > write
LSN = 1003 read 1000 write
LSN = 940 read ? Table.ibd.delta
LSN = 1010 read write
5. 5
Incremental XtraBackup: Can We
Avoid Reading the Old Pages?
• http://bit.ly/FBIncBackup
6. 6
Incremental XtraBackup: Can We
Avoid Reading the Old Pages?
• How do we know which pages to read then?
• Two ways to get the modification LSN of a page:
– It is written on the page, - or -
– We can figure it out from the redo log
• The log is cyclical, we must, in the server, save the
info before it is overwritten
• --innodb-track-changed-pages=TRUE
– Percona Server 5.1.67-14.4, 5.5.29-30.0, 5.6.
– Percona XtraBackup 2.1 – zero configuration!
9. 9
Percona Server with Changed Page
Tracking: Server Overhead
• Nothing is ever free!
– But the price might be very well
acceptable
• Potential overhead #1: extra disk
space requirements
• Potential overhead #2: extra code
running in the server
11. 1
1
Percona Server with Changed Page
Tracking: Server Overhead
• Impact on TPS and response time:
– It's a wash
– With
--innodb_flush_method=fdatasync
12. 1
2
Bitmap File Naming & Sizing
• ib_modified_log_<seq>_<LSN>.xdb
– <Seq>: 1, 2, 3, ...
– <LSN>: the server LSN at the file create
time
• Rotated on
– Server start
– innodb_max_bitmap_file_size
14. 1
4
INFORMATION_SCHEMA.INNODB_CHANGED_PAGES
• Percona Server can read the bitmaps too
SHOW CREATE TABLE INFORMATION_SCHEMA.INNODB_CHANGED_PAGES;
CREATE TABLE `INNODB_CHANGED_PAGES` (
`space_id` int(11) unsigned NOT NULL DEFAULT '0',
`page_id` int(11) unsigned NOT NULL DEFAULT '0',
`start_lsn` bigint(21) unsigned NOT NULL DEFAULT '0',
`end_lsn` bigint(21) unsigned NOT NULL DEFAULT '0'
)
• start_lsn and end_lsn are always at the checkpoint boundary
• Does not show the exact LSN of a change
• Does not show the number of changes for one page
• Does show the number of flushes for a page over the workload
15. 1
5
INFORMATION_SCHEMA.INNODB_CHANGED_PAGES
SELECT * FROM INFORMATION_SCHEMA.INNODB_CHANGED_PAGES;
space_id page_id start_lsn end_lsn
0 0 8204 38470
0 1 8204 38470
5 0 8204 38470
5 3 8204 38470
0 1 38471 50000
5 3 38471 50000
5 3 50001 60000
• Don't query like that in production!
– It will read all the bitmaps you have. Gigabytes, terabytes, ...
– Add WHERE start_lsn > X AND end_lsn < Y (index condition pushdown
implemented for this case)
16. 1
6
INFORMATION_SCHEMA.INNODB_CHANGED_PAGES
• Which tables are written to?
SELECT DISTINCT space_id FROM
INFORMATION_SCHEMA.INNODB_CHANGED_PAGES WHERE ...;
space_id
0
10
SELECT DISTINCT t1.space_id AS space_id, t2.schema AS db,
t2.name AS tname
FROM INFORMATION_SCHEMA.INNODB_CHANGED_PAGES AS t1,
INFORMATION_SCHEMA.INNODB_SYS_TABLES AS t2
WHERE t1.space_id = t2.space AND t1.start_lsn >...
space_id db tname
0 SYS_FOREIGN
0 SYS_FOREIGN_COLS
10 test foo
17. 1
7
INFORMATION_SCHEMA.INNODB_CHANGED_PAGES
• What are the hottest tables?
SELECT space_id,
COUNT(space_id) AS number_of_flushes
FROM INFORMATION_SCHEMA.INNODB_CHANGED_PAGES
GROUP BY space_id
ORDER BY number_of_flushes DESC;
space_id number_of_flushes
0 65
10 5
11 4
18. 1
8
INFORMATION_SCHEMA.INNODB_CHANGED_PAGES
• What are the hottest pages?
SELECT space_id, page_id,
COUNT(page_id) AS number_of_flushes
FROM INFORMATION_SCHEMA.INNODB_CHANGED_PAGES
GROUP BY space_id, page_id
HAVING number_of_flushes > 2
ORDER BY number_of_flushes DESC
LIMIT 8;
space_id page_id number_of_flushes
0 5 3
0 7 3
0 0 2
0 11 2
10 3 2
0 1 2
0 12 2
0 2 2
19. 1
9
INFORMATION_SCHEMA.INNODB_CHANGED_PAGES
• For complex queries, copy data first
CREATE TEMPORARY TABLE icp (
space_id INT(11) NOT NULL,
page_id INT(11) NOT NULL,
start_lsn BIGINT(21) NOT NULL,
end_lsn BIGINT(21) NOT NULL,
INDEX page_id(space_id, page_id),
INDEX start_lsn(start_lsn),
INDEX end_lsn(end_lsn)) ENGINE=InnoDB;
INSERT INTO icp SELECT * FROM
INFORMATION_SCHEMA.INNODB_CHANGED_PAGES WHERE
start_lsn > 8000;
20. 2
0
INFORMATION_SCHEMA.INNODB_CHANGED_PAGES
• For complex queries, copy data first
EXPLAIN SELECT DISTINCT space_id FROM
INFORMATION_SCHEMA.INNODB_CHANGED_PAGES;
id select_type table type possible_keys key key_len
ref rows Extra
1 SIMPLE INNODB_CHANGED_PAGES ALL NULL NULL NULL
NULL NULL Using temporary
EXPLAIN SELECT DISTINCT space_id FROM icp;
id select_type table type possible_keys key key_len
ref rows Extra
1 SIMPLE icp index NULL page_id 8 NULL 74 Using
index
21. 2
1
Implementation: File Format
A sequence of per-checkpoint varying number of data pages:
Data for checkpoint at LSN 9000
LSN 10000
LSN 10500
For each checkpoint: 4KB
space, start page space, start page space, start page
Each page contains a bitmap for the next 32480 pages in space starting
from start page
22. 2
2
Implementation: Server Side
• A new XtraDB thread
– 1. Wait for log checkpoint completed event
– 2. Read the log up to the checkpoint, write the bitmap
– 3. goto 1
• Little data sharing with the rest of XtraDB
– log_sys->mutex for:
● setting and getting LSNs;
● calculating log read offset from LSN.
• Little extra code for the query threads
– Unread log overwrite check
– Firing of the log checkpoint completed event
–
23. 2
3
Implementation: Things We Had to
Account For
• Maximum checkpoint age violation
– Destroys untracked log data
– Make effort to avoid, but in the end we
allow to overwrite it
– Responding server > fast backups
• Crash recovery
– Re-read the log if available
24. 2
4
Conclusions
• Percona Server together with Percona
XtraBackup:
• Enable faster incremental backups
• Enable more frequent incremental backups
• Does not hurt server operation, but have to
manage the bitmaps now
• New INFORMATION_SCHEMA table for gaining
insight into data change patterns
• Thank you! Questions?