Talks about the problems of concurrency and lock based mechanism and need for Multi Version Concurrency Control (MVCC).
PostgreSQL, application of MVCC.
Multi version Concurrency Control and its applications in Advanced database system
1. Multiversion Concurrency
Control and its applications
in Advanced database system
Submitted by: Gautham SK
1st semester, M.Tech in Computer Science & Engineering
Date: 09-01-2020
1
2. CONTENTS
• Introduction
• Concurrency Control Mechanism
• Problems with Lock Based Mechanism
• Multiversion Concurrency Control
• MVCC based on timestamp
• MV2PL
• Application of MVCC (PostgreSQL)
2
3. INTRODUCTION
• Concurrent Execution: It implies “interleaving” execution of operations on two or more
transactions.
Benefits: -> Reduces waiting time
-> Improves throughput, response time & resource utilization.
3
4. Problems of Concurrency
• Lost update: It occurs when two concurrent transactions, T1 and T2, are updating the same
data element and one of the updates is lost (overwritten by the other transaction).
• Dirty Read: A transaction T1 updates a record which is read by T2. If T1 aborts then T2 now
has values which have never formed part of the stable database.
• Unrepeatable Read: A transaction T1 reads a record and then does some other processing
during which the transaction T2 updates the record. Now when the transaction T1 reads the
record, then the new value will be inconsistent with the previous value.
• Incorrect Summary Problem: When one of the transactions is checking on aggregate summary
function while other transactions are updating, aggregate functions may calculate some values
before they updated and others after they are updated.
4
5. Concurrency Control Mechanism
• Use of Locks: A lock is a variable associated with a data item that describes the status of the
item with respect to possible operations that can be applied to it.
E.g. Two-Phase Locking.
The basic idea is to lock(Binary lock or Shared/Exclusive (or Read/Write) Locks) the data object
which needs to be updated, so that no other transaction can access it, so that it doesn’t lead to
data inconsistency.
5
6. Problems with Lock Based Mechanism
• Deadlocks: Occurs when each transaction T in a set of two or more transactions is waiting for
some item that is locked by some other transaction T′ in the set. Hence, each transaction in
the set is in a waiting queue, waiting for one of the other transactions in the set to release the
lock on an item. But because the other transaction is also waiting, it will never release the
lock.
• Starvation: Occurs when a transaction cannot proceed for an indefinite period of time while
other transactions in the system continue normally. This mainly occurs when priority is given
to some transactions over others.
6
7. Multiversion Concurrency Control
(MVCC)
(Concurrency control without locking)
• The basic idea of MVCC is that the DBMS maintains multiple physical
versions of each logical object in the database to allow operations on the
same object to proceed in parallel.
• Multi-versioning allows read-only transactions to access older versions of
tuples without preventing read-write transactions from simultaneously
generating newer versions.
• There are 2 types
-> Multiversion Technique Based on Timestamp Ordering
-> Multiversion Two-Phase Locking Using Certify Locks
7
8. Multiversion Technique Based on
Timestamp Ordering
• In this method, several versions X1, X2, … , Xk of each data item X are maintained.
• For each version, the value of version Xi and the following two timestamps associated with version
Xi are kept:
1. read_TS(Xi): The read timestamp of Xi is the largest of all the time stamps of
transactions that have successfully read version Xi.
2. write_TS(Xi). The write timestamp of Xi is the timestamp of the transaction
that wrote the value of version Xi.
• Whenever a transaction T is allowed to execute a write_item(X) operation, a new version
Xk+1 of item X is created, with both the write_TS(Xk+1) and the read_TS(Xk+1) set to
TS(T). Correspondingly, when a transaction T is allowed to read the value of version
Xi, the value of read_TS(Xi) is set to the larger of the current read_TS(Xi) and TS(T).
8
9. • To ensure serializability, the following rules are used:
i. If Ti issued a read (Qi) request check:
if read_TS(Qi) < TS(Ti),
then the system returns the value of Qi & update the value of
read_TS(Qi) = TS(Ti).
ii. If Ti issued a write (Qi) request check:
(a) if read_TS(Qi) > TS(Ti),
then Ti is rolled back. Because a younger transaction has already read the value of Qi.
(b) if TS(Ti) > write_TS(Qi),
then create a new version Qi with read_TS(Qj) = write_TS(Qj) = TS(Ti).
(c) if TS(Ti) = write_TS(Qi),
then the contents Qi are overwritten.
9
10. Advantages
• Read request from transactions are never blocked.
• Useful for database with read request more than write request
Disadvantages
• Twice the disk has to be accessed when a read as to be performed. One for data item and for
updating read_TS(Qi).
• In case of conflict between two transaction, one is rolled back and other could led to cascading.
10
11. Multiversion Two-Phase Locking Using
Certify Locks
• MV2PL supports 3 locking modes: read, write and certify locks.
• It creates 2 version of data item X:
-> Committed version: The original committed data item
-> Local version X’: The copy of committed values.(Created when a transaction T acquires write
lock on X)
• Other transactions can continue to read the committed version of X while T holds the write lock.
Transaction T can write the value of X′ as needed, without affecting the value of the committed
version X.
• When the write is ready to commit, it must obtain a Certify lock which is not compactable with
read locks, that means next read operation cannot be granted and the commit is delayed until all
reading transaction are released in order to obtain the certify locks.
11
12. • Once the locks are obtained, the corresponding local version is converted to new committed
version and the old one is removed.
Advantages:
• It avoids cascading aborts, since transactions are only allowed to read the version X that was
written by a committed transaction.
Disadvantages:
• Their might be a delay to commit the transaction until it obtains exclusive certify locks on all
the items.
• Deadlocks can also occur.
12
13. Application of MVCC
PostgreSQL
• PostgreSQL, also known as Postgres, is a free and worlds most advanced open
source relational management system (RDBMS) which emphasizing on
extensibility.
• It is designed to handle a range of workloads, from single machines to data
warehouses or Web services (cloud) with many concurrent users.
• It is the default database for macOS Server, is also available for Linux and
Windows etc.
• And it is written in C and manages concurrency through MVCC.
13
14. • Postgres stores all row version in the table data structure and every row as two additional
columns:
-> tmin : which defines the transaction id that inserted the record.
-> tmax : which defines the transaction id that deleted the record.
• The Transaction Id is a 32-bit integer and VACUUM is a process that is responsible for
reclaiming old row versions that are no longer in use and making sure that the id does not
overflow.
14
16. 1. Both Alice and bob start a new transaction, and we can see their transaction id by calling the
txid_current() PostgreSQL function.
2. When Alice inserts a new post row, the tmin column value is set to Alice’s transaction id.
3. Under default Read Committed isolation level, Bob cannot see Alice’s newly inserted record until
Alice commits her transaction.
4. After Alice has committed, Bob can now see Alice ‘s newly inserted row.
5. If the transaction id is higher than the tmin value of the committed row, the transaction is allowed to
read this record version.
16
18. 1. Both Alice and Bob start a new transaction.
2. When Bob deletes a post row, the tmax column value is set to Bob’s transaction id.
3. Under default Read Committed isolation level, until Bob manages to commit his transaction ,
Alice can still see the record that was deleted by Bob.
4. After Bob has committed ,Alice can no longer see the deleted row.
5. The DELETE operation does not physically remove a record, it just marks it as ready for
deletion, and the VACUUM process will collect it when the row is no longer in use by any current
running transaction.
18
20. 1. When Bob updates a post record, we can see two operations happening: a DELETE and
INSERT.
2. The previous row version is marked as deleted by setting the tmax column value to Bob’s
transaction id, and a new row version is created which has the tmin column value set to Bob’s
transaction id.
3. Under default Read Committed isolation level, until Bob manages to commit his transaction,
Alice can still see the previous record version.
4. After Bob has committed, Alice can now see the new row version that was updated by Bob.
20
21. CONCLUSION
• By allowing multiple versions of the same record, there is going to be less
contention on reading/writing records since Readers will not block writers
and Writers will not block Readers as well.
• Although not as intuitive as 2PL (Two-Phase Locking), MVCC is not very
difficult to understand either. However, it’s very important to understand
how it works, especially since data anomalies are treated differently than
when locking is being employed.
21
22. References
• Ramez Elmasri, Shamkant B. Navathe - Fundamentals of Database Systems
(2015, Pearson) 21.3
• https://www.includehelp.com/dbms/concurrency-and-problem-due-to-
concurrency.aspx
• https://www.youtube.com/watch?v=LfRPplRPChY&list=LLA8eVIv7M6axeRJ
JA2Vr-wg
• https://vladmihalcea.com/how-does-mvcc-multi-version-concurrency-control-
work/
• Wu, Y., Arulraj, J., Lin, J., Xian, R., & Pavlo, A. (2017). An empirical
evaluation of multi-version concurrency control. Proceedings of the VLDB
Endowment, 10(7), 781–792. doi:10.14778/3067421.3067427
• https://en.wikipedia.org/wiki/PostgreSQL#Multiversion_concurrency_control
_(MVCC)
22