More Related Content
Similar to Introduction VAUUM, Freezing, XID wraparound
Similar to Introduction VAUUM, Freezing, XID wraparound (20)
More from Masahiko Sawada
More from Masahiko Sawada (20)
Introduction VAUUM, Freezing, XID wraparound
- 1. Copyright © 2016 NTT DATA Corporation
03/17/2016
NTT DATA Corporation
Masahiko Sawada
Introduction VACUUM, FREEZING, XID wraparound
- 2. 2Copyright © 2016NTT DATA Corporation
A little about me
Ø Masahiko Sawada
Ø twitter : @sawada_masahiko
Ø NTT DATA Corporation
Ø Database engineer
Ø PostgreSQL Hacker
Ø Core feature
Ø pg_bigm (Multi-byte full text search module for PostgreSQL)
- 3. 3Copyright © 2016NTT DATA Corporation
Contents
• VACUUM
• Visibility Map
• Freezing Tuple
• XID wraparound
• New VACUUM feature for 9.6
- 5. 5Copyright © 2016 NTT DATA Corporation
VACUUM
1 AAA
2 BBB
3 CCC
2 bbb
4 DDD Concurrently INSERT/DELETE/UPDATE
1 AAA
2 BBB
3 CCC
2 bbb
1 AAA
3 CCC
2 bbb
4 DDD
VACUUM
Starts
VACUUM
Done
FSM
UPDATE : BBB->bbb
• Postgres garbage collection feature
• Acquire ShareUpdateExclusive Lock
- 6. 6Copyright © 2016 NTT DATA Corporation
Why do we need to VACUUM?
• Recover or reuse disk space occupied
• Update data statistics
• Update visibility map to speed up Index-Only Scan.
• Protect against loss of very old data due to XID wraparound
- 7. 7Copyright © 2016 NTT DATA Corporation
Evolution history of VACUUM
v8.1 (2005) v8.4 (2009)
autovacuum
!?
Visibility Map
Free Space Map
v9.5 (2016)
vacuumdb
parallel option
v9.6
- 8. 8Copyright © 2016 NTT DATA Corporation
VACUUM Syntax
-- VACUUM whole database
=# VACUUM;
-- Multiple option, analyzing only col1 column
=# VACUUM FREEZE VERBOSE ANALYZE hoge (col1);
-- Multiple option with parentheses
=# VACUUM (FULL, ANALYZE, VERBOSE) hoge;
- 10. 10Copyright © 2016 NTT DATA Corporation
Visibility Map
• Introduced at 8.4
• A bit map for each table (1 bit per 1 page)
• A table relation can have a visibility map.
• keep track of which pages are all-visible page
• keep track of which pages are having garbage.
• If 500GB table, Visibility Map is less than 10MB.
Table
(base/XXX/1234)
Visibility Map
(base/XXX/1234_vm)
Block 0
Block 1
Block 2
Block 3
Block 4
11001…
- 11. 11Copyright © 2016 NTT DATA Corporation
State transition of Visibility Map bit
VACUUM
0 1
INSERT, UPDATE, DELETE
(NOT all-visible)
(all-visible)
- 12. 12Copyright © 2016 NTT DATA Corporation
How does the VACUUM works actually?
• VACUUM works with two phases;
1. Scan table to collect TID
2. Reclaim garbage (Table, Index)
maintenance_work_mem
Index
Table
Scan
Table
Collect
garbage TID
Reclaim
garbages
1st Phase
2nd Phase
- 13. 13Copyright © 2016 NTT DATA Corporation
Performance improvement point of VACUUM
• Scan table page one by one.
• vacuum can skip, iff there are more than 32 consecutive all-visible pages
• Store and remember garbage tuple ID to maintenance_work_mem.
VACUUM can skip to scan efficiency.
SLOW!!FAST!
VACUUM needs to scan all page.
: all-visible block
: Not all-visible block
- 15. 15Copyright © 2016 NTT DATA Corporation
What is the transaction ID (XID)?
• Every tuple has two transaction IDs.
• xmin : Inserted XID
• xmax : Deleted/Updated XID
xmin | xmax | col
-------+------+------
1810 | 1820 | AAA
1812 | 0 | BBB
1814 | 1830 | CCC
1820 | 0 | XXX
In REPEATABLE READ transaction isolation level,
• Transaction 1815 can see ‘AAA’, ‘BBB’ and ‘CCC’.
• Transaction 1821 can see ‘BBB’, ‘CCC’ and ‘XXX’
• Transaction 1831 can see ‘BBB’ and ‘XXX’.
- 16. 16Copyright © 2016 NTT DATA Corporation
What is the transaction ID (XID)?
• Can represent up to 4 billion transactions (uint32).
• XID space is circular with no endpoint.
• There are 2 billion XIDs that are “older”, 2 billion XIDs that are “newer”.
0232-1
Older
(Not visible)
Newer
(Visible)
- 17. 17Copyright © 2016 NTT DATA Corporation
What is the XID wraparound?
XID=100
XID=100
XID 100 become
not visible
XID=100
Older
(Visible)
Newer
(Not visible)
XID 100 is visible
Older
(Not visible)
Older
(Not visible)
Newer
(Visible)
Newer
(Visible)
Still visible
• Postgres could loss the very old data due to XID wraparound.
• When tuple is more than 2 billion transaction old, it could be happen.
• If 200 TPS system, it’s happen every 120 days.
• Note that it could be happen on INSERT-only table.
- 18. 18Copyright © 2016 NTT DATA Corporation
Freezing tuple
• Mark tuple as “Frozen”
• Marking “frozen” means that it will appear to be “in the past” to all transaction.
• Must freeze old tuple *before* XID proceeds 2 billion.
XID=100
(FREEZE)
XID=100
(FREEZE)
Tuple is visible.
XID=100
Older
(Visible)
Newer
(Not visible)
XID 100 is visible
Older
(Not visible)
Older
(Not visible)
Newer
(Visible)
Newer
(Visible)
Still visible.
Tuple is marked as
‘FREEZE’
- 19. 19Copyright © 2016 NTT DATA Corporation
To prevent old data loss due to XID wraparound
• Emit WARNING log at 10 million transactions remaining.
• Prohibit to generate new XID at 1 million transactions remaining.
• Run anti-wraparound VACUUM automatically.
- 20. 20Copyright © 2016 NTT DATA Corporation
Anti-wraparound VACUUM
• All table has pg_class.relfrozenxid value.
• All tuples which had been inserted by XID older than relfrozenxid have been
marked as “Frozen”.
• Same as forcibly executed VACUUM *FREEZE*.
Current XID
pg_class.
relfrozenxid
anti-
wraparound
VACUUM is
launched
forcibly
VACUUM could
do a whole
table scan
autovacuum_max_freeze_age
(default 200 million)
+ 2 billion
vacuum_freeze_table_age
(default 150 million)
XID
wraparound
- 21. 21Copyright © 2016 NTT DATA Corporation
Anti-wraparound VACUUM
At this XID, lazy VACUUM is
executed.
Current XID
pg_class.
relfrozenxid
anti-
wraparound
VACUUM is
launched
forcibly
VACUUM could
do a whole
table scan
autovacuum_max_freeze_age
(default 200 million)
+ 2 billion
vacuum_freeze_table_age
(default 150 million)
XID
wraparound
VACUUM
- 22. 22Copyright © 2016 NTT DATA Corporation
VACUUM could
do a whole
table scan
Anti-wraparound VACUUM
If you execute VACUUM at this XID,
anti-wraparound VACUUM will be
executed.
If you do VACUUM at this XID,
anti-wraparound VACUUM is
executed.
pg_class.
relfrozenxid
anti-
wraparound
VACUUM is
launched
forcibly
autovacuum_max_freeze_age
(default 200 million)
+ 2 billion
vacuum_freeze_table_age
(default 150 million)
XID
wraparound
anti-wraparound
VACUUM
Current XID
- 23. 23Copyright © 2016 NTT DATA Corporation
Anti-wraparound VACUUM
After current XID is exceeded, anti-
wraparound VACUUM is launched
forcibly by autovacuum.
pg_class.
relfrozenxid
anti-
wraparound
VACUUM is
launched
forcibly
autovacuum_max_freeze_age
(default 200 million)
+ 2 billion
vacuum_freeze_table_age
(default 150 million)
XID
wraparound
anti-wraparound
auto VACUUM
Current XID
VACUUM could
do a whole
table scan
- 24. 24Copyright © 2016 NTT DATA Corporation
Anti-wraparound VACUUM
After anti-wraparound VACUUM,
relrozenxid value is updated.
Current XID
pg_class.
relfrozenxid
vacuum_freeze_min_age
(default 50 million)
- 25. 25Copyright © 2016 NTT DATA Corporation
anti-wraparound VACUUM is too slow
• Scanning whole table is always required to proceed relfrozenxid.
• Because lazy vacuum could skip page having the visible but not frozen tuple.
Visibility
Map
Block
#
xmin
0
0
FREEZE
FREEZE
1
1
FREEZE
FREEZE
1
2
101
102
103
0
3
Garbage
104
Normal
VACUUM
Anti-
wraparound
VACUUM
- 26. Copyright © 2016 NTT DATA Corporation 26
How can we improve anti-wraparound VACUUM?
- 27. 27Copyright © 2016 NTT DATA Corporation
Approaches
• Freeze Map
• Track pages which are necessary to be frozen.
• 64bit XID
• Change size of XID from 32bit to 64bit.
• LSN to XID map
• Mapping XID to LSN.
- 28. 28Copyright © 2016 NTT DATA Corporation
Freeze Map
• New feature for 9.6.
• Improve VACUUM FREEZE, anti-wraparound VACUUM performance.
• Bring us to functionality for VLDB.
- 29. 29Copyright © 2016 NTT DATA Corporation
Idea - Add an additional bit
• Not adding new map.
• Add a additional bit to Visibility Map.
• The additional bits tracks which pages are all-frozen.
• All-frozen page should be all-visible as well.
10110010
all-visible
all-frozen
- 30. 30Copyright © 2016 NTT DATA Corporation
State transition of two bits
00
10
11
all-visible
all-frozen
VACUUM
UPDATE/
DELETE/
INSERT
UPDATE/
DELETE/
INSERT
VACUUM
FREEZE
VACUUM
FREEZE
- 31. 31Copyright © 2016 NTT DATA Corporation
Idea - Improve anti-wraparound performance
• VACUUM can skip all-frozen page even if anti-wraparound VACUUM is
required.
Normal
VACUUM
Anti-
wraparound
VACUUM
Visiblity Map Block
#
xmin
visible frozen
1
0
0
FREEZE
FREEZE
1
1
1
FREEZE
FREEZE
1
0
2
101
102
103
0
0
3
Garbage
104
- 32. 32Copyright © 2016 NTT DATA Corporation
Pros/Cons
• Pros
• Dramatically performance improvement for VACUUM FREEZE.
• Read only table. (future)
• Cons
• Bloat Visibility Map size as twice.
- 33. 33Copyright © 2016 NTT DATA Corporation
No More Full-Table Vacuums
http://rhaas.blogspot.jp/2016/03/no-more-full-table-vacuums.html#comment-form
- 35. 35Copyright © 2016 NTT DATA Corporation
Vacuum Progress Checker
• New feature for 9.6. (under reviewing)
• Report progress information of VACUUM via system view.
- 36. 36Copyright © 2016 NTT DATA Corporation
Idea
• Add new system view.
• Report meaningful progress information for detail per process doing VACUUM.
postgres(1)=# SELECT * FROM pg_stat_vacuum_progress ;
-[ RECORD 1 ]-------+--------------
pid | 55513
relid | 16384
phase | Scanning Heap
total_heap_blks | 451372
current_heap_blkno | 77729
total_index_pages | 559364
scanned_index_pages | 559364
index_scan_count | 1
percent_complete | 17
- 37. 37Copyright © 2016 NTT DATA Corporation
Future works
• Read Only Table
• Report progress information of other maintenance command.
- 38. Copyright © 2011 NTT DATA Corporation
Copyright © 2016 NTT DATA Corporation
PostgreSQL git repository
git://git.postgresql.org/git/postgresql.git
- 39. 39Copyright © 2016 NTT DATA Corporation
VERBOSE option
=# VACUUM VERBOSE hoge;
INFO: vacuuming "public.hoge"
INFO: scanned index "hoge_idx1" to remove 1000 row versions
DETAIL: CPU 0.00s/0.01u sec elapsed 0.01 sec.
INFO: "hoge": removed 1000 row versions in 443 pages
DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "hoge_idx1" now contains 100000 row versions in 276
pages
DETAIL: 1000 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "hoge": found 1000 removable, 100000 nonremovable row
versions in 447 out of 447 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.05u sec elapsed 0.05 sec.
VACUUM
- 40. 40Copyright © 2016 NTT DATA Corporation
FREEZE option
• Aggressive freezing of tuples
• Same as running normal VACUUM with vacuum_freeze_min_age = 0 and
vacuum_freeze_table_age = 0
• Always scan whole table
- 41. 41Copyright © 2016 NTT DATA Corporation
ANALYZE option
• Do ANALYZE after VACUUM
• Update data statistics used by planner
-- VACUUM and analyze with VERBOSE option
=# VACUUM ANALYZE VERBOSE hoge;
INFO: vacuuming "public.hoge"
:
INFO: analyzing "public.hoge"
INFO: "hoge": scanned 452 of 452 pages, containing 100000 live rows and
0 dead rows; 30000 rows in sample, 100000 estimated total rows
VACUUM
- 42. 42Copyright © 2016 NTT DATA Corporation
FULL option
• Completely different from lazy VACUUM
• Similar to CLUSTER
• Acquire AccessExclusiveLock
• Take much longer than lazy VACUUM
• Need more space at most twice as table size.
• Rebuild table and indexes
• Freeze tuple while VACUUM FULL (9.3~)