8. MVCC, why?
● Support for many concurrent users
● Atomicity and Isolation (ACID)
● Performance
● Fewer locks
9. MVCC, how?
● Postgres stores multiple versions of the same row in the table
● INSERT is just plain insert
● DELETE is marking row as deleted
● UPDATE is DELETE old row and INSERT new row
● Postgres shows different versions of a row to different transactions
● After a while deleted rows are not visible to any running transactions
● They are called dead rows
● Postgres needs to cleanup dead rows from time to time
● It is like Garbage Collection in dynamic languages
11. MVCC, how?
● Postgres stores multiple versions of the same row in the table
● INSERT is just plain insert
● DELETE is marking row as deleted
● UPDATE is DELETE old row and INSERT new row
● Postgres shows different versions of a row to different transactions
● After a while deleted rows are not visible to any running transactions
● They are called dead rows
● Postgres needs to cleanup dead rows from time to time
● It is like Garbage Collection in dynamic languages
12. MVCC, details
● Postgres stores two additional columns for each row
● ID of transaction that created a row
● ID of transaction that deleted a row
● Each transaction gets its own id (TXID) at start of first modify statement
● TXIDs are 32-bits incremental integers
● Lower TXIDs mean earlier transactions
13. MVCC, details
● Two additional columns: xmin and xmax
● xmin is transaction ID that created the row
● xmax is transaction ID that deleted the row
● Those are hidden columns available in all tables
● You can see them by using explicit select statements
● You will get an error if you add columns with such names
14.
15. MVCC, inspecting
● You can look into physical table files and find deleted rows
● Or you can use pageinspect extension
● It can fetch raw page data, page headers, page rows, etc
18. Transaction Snapshots
● Frozen view of current transactions status
● Snapshot has format xmin:xmax:xip, for example 12:16:12,14
● xmin = 12, this means that earliest running transaction id is 10
● All earliest transactions (less than 12) are either committed and visible, or
aborted and dead
● xmax = 16, first as-yet unassigned transaction id
● All transaction equal or greater than 16 are not yet started and thus invisible
● xip = [12, 14], active transactions only between xmin and xmax
● Transactions 13 and 15 are either committed and visible, or aborted and dead
19.
20.
21. MVCC, visibility checks
current snapshot 101:101:, all transactions were committed
xmin xmax visible?
25 YES
25 50 NO
50 110 YES
110 NO
110 120 NO
22. MVCC, visibility checks
Current snapshot 25:101:25,50,75, all transactions were committed
xmin xmax visible?
30 YES
50 NO
110 NO
30 80 NO
30 75 YES
30 110 YES
24. Snapshots and Isolation Levels
● Postgres supports 3 isolation levels (READ COMMITTED, REPEATABLE READ
and SERIALIZABLE)
● In READ COMMITTED snapshot is recorded at start of each SQL statement
● And at transaction start in higher isolation levels
30. Commit Log
● 2 bits per transaction (in progress, committed, aborted, ...)
● Committing or aborting a transaction is just flipping a bit in Commit Log
● All transactions (committed and aborted) have side-effects
● Hint bits in table rows, optimization to avoid Commit Log lookups
● Innocent table scan can possibly update a lot of hint bits and perform heavy
table write
33. Commit Log
● 2 bits per transaction (in progress, committed, aborted, ...)
● Committing or aborting a transaction is just flipping a bit in Commit Log
● All transactions (committed and aborted) have side-effects
● Hint bits in table rows, optimization to avoid Commit Log lookups
● Innocent table scan can possibly update a lot of hint bits and perform heavy
table write
35. MVCC, vacuuming
● Vacuum is like a Garbage Collector
● Looks for rows that are no longer visible to any running transactions and
removes them
● Avoid long-running transactions
● Makes room for new rows in existing pages
● Autovacuum can happen at any time
37. MVCC, vacuuming
● Vacuum is like a Garbage Collector
● Looks for rows that are no longer visible to any running transactions and
removes them
● Avoid long-running transactions
● Makes room for new rows in existing pages
● Autovacuum can happen at any time
42. Transaction Wraparound, huh?
● Transaction IDs (TIDs) are 32-bit integers
● That is ~ 4 billion transactions
● With enough traffic it can quickly wraparound
● Suddenly transactions that were in the past appear to be in the future
● And their output is invisible
43. Transaction Wraparound, solutions?
● Vacuum freezes old transactions, that are way in the past
● Freezing sets special flag on rows
● Set flag means that this row is visible to all transactions
● Can be done manually with VACUUM FREEZE
44. Main takeaways
● Postgres stores multiple versions of the same row in the table
● All transactions (committed or aborted) have side-effects
● All updates to the table create bloat
● Vacuum removes bloat and can happen at any time
● Avoid long-running transactions