Transaction Processing Systems
• Transaction Processing Systems are the
systems with large databases and hundreds of
concurrent users executing database
transactions.
• For example airline reservations, banking,
stock markets, etc.
Transaction
• A transaction is an executing programthat
formsalogical unitofdatabaseprocessing.
• A transaction includes one or more database
access operations- these can include insertion,
deletion, modification, orretrieval operations.
• Transaction is executed as a single unit. It is a
program unit whose execution may or may not
changethecontentsofadatabase.
Example
• A transfer of money from one bank account to
another requires two changes to the database
both must succeed or fail together.
– Subtracting the money from the savings account balance.
– Adding the money to the checking account balance.
5000
5000
ProcessesofTransaction
• Read Operation: To read a database object, it is
first brought into main memory from disk and
thenits value is copiedintoa program variable.
Read(2)
A
MainMemory
(RAM)
INPUT (1)
a
Program
Variable
5000 A
Disk
Physical
Block (X)
5000to7000
7000
ProcessesofTransaction
• Write Operation: To write a database object, an
in-memory copy of the object is first
modifiedandthenwrittenbacktodisk.
Write(1)
A
MainMemory
(RAM)
OUTPUT(2)
a
Program
Variable
5000to7000 A
Disk
Physical
Block (X)
DesirablePropertiesofTransactions
ACIDproperties:
• Atomicity: A transaction is an atomic unit of
processing; it is either performed in its entirety or not
performedatall.
• Consistency preservation: A correct execution of the
transaction must take the database from one
consistentstatetoanother.
• Isolation: A transaction should appear as though it is
being executed in isolation from other transactions,
even if many transactions are executing concurrently.
That is, the execution of a transaction should not be
interferedwithanyotherexecutingtransactions.
• Durability or permanency: Once a transaction changes
the database and the changes are committed, these
changesmustneverbelostbecauseofanyfailure.
TransactionProperties
• Let T1 be a transaction that transfers Rs 50 from
account A to account B. This transaction can be
definedas:
• SayvalueofApriortotransactionwas1000and that
ofBwas2000
• Atomicity: Suppose that during the execution of T1, a
power failure has occurred that prevented the T1 to
complete successfully. The point of failure may be after the
completion of Write(A) and beforeWrite(B). It meansthat the
changes in A are performed but not in B. Thus the values
of account A and B are Rs.950 and Rs.2000 respectively.
WehavelostRs.50asaresultofthisfailure.
• Now, ourdatabaseisininconsistentstate.
• Thereasonforthisinconsistentstateisthatourtransaction is
completedpartially.
• In order to maintain atomicity of transaction, the database
system keeps track of the old values of any write and if the
transaction does not complete its execution, the old values
are restored to make it appear as the transaction never
executed.
• Consistency: The consistency requirement here is
that the sum of A and B must be unchanged by the
executionofthetransaction. Itcanbeverified easily
that, if the database is consistent before an
execution of the transaction, the database
remains consistent after the execution of the
transaction.
• Ensuring consistency for an individual transaction is
the responsibility of the application
programmerwhocodesthetransaction.
• Isolation: If several transactions are executed
concurrently (or in parallel), then each
transaction must behave as if it was executed in
isolation. It means that concurrent
execution does not result an inconsistent
state.
• For example, consider another transaction T2,
which hastodisplay thesumof accountAand B.
Then, itsresultshouldbeRs.3000.
TransactionProperties
• Lets supposethat T1andT2performconcurrently,
theirscheduleisshownbelow:
T1 T2 Status
Read(A, a) ValueofAi.e. 1000iscopiedtolocal variablea
a: a-50 Local variablea=950
Write(A, a) Valueoflocal variablea950iscopiedtodatabaseitemA
Read(A, a) ValueofdatabaseitemA950iscopiedtoa
Read(B, b) ValueofdatabaseitemB2000iscopiedtob
Display(a+b) 2950isdisplayedassumofaccountsAandB
Read(B, b) ValueofdataitemB2000iscopiedtovocal variableb
b: =b+50 Local variableb=2050
Write(B, b) Valueoflocal variableb2050iscopiedtodatabaseitemB
• Durability: Once the execution of the
transaction completes successfully, and the
user who initiated the transaction has been
notified that the transfer of funds has taken
place, it must be the case that no system
failure will result in a loss of data
correspondingtothistransferoffunds.
Transaction States
• Active state– the initial state; the transaction stays in this state
while it is executing .
• Partially committed state– after the final statement has been
executed.
• Failed state -- after the discovery that normal execution can no
longer proceed.
• Aborted state– after the transaction has been rolled back and
the database restored to its state prior to the start of the
transaction. Two options after it has been aborted: – restart the
transaction can be done only if no internal logical
error
– kill the transaction
• Committed state – after successful completion
ConcurrentExecutions
• Multiple transactions are
concurrentlyinthesystem.
allowed to run
• Advantages are:
– increased processor and disk utilization, leading to better
transaction throughput
• E.g. one transaction can be using the CPU while another is reading from
or writing to the disk
– reduced average response time for transactions: short
transactions need not wait behind long ones.
• Concurrency control schemes – mechanisms to
achieve isolation that is, to control the interaction
among the concurrent transactions in order to
prevent them from destroying the consistency of
the database
Why Concurrency Control is needed? •
The Lost Update Problem
– This occurs when two transactions that access the same
database items have their operations interleaved in a way that
makes the value of some database item incorrect.
• The Temporary Update (or Dirty Read) Problem
– This occurs when one transaction updates a database item
and then the transaction fails for some reason
– The updated item is accessed by another transaction before it
is changed back to its original value.
• The Incorrect Summary Problem
– If one transaction is calculating an aggregate summary
function on a number of records while other transactions are
updating some of these records, the aggregate function may
calculate some values before they are updated and others after
they are updated.
Schedules and Conflicts
• In a system with a number of simultaneous
transactions, a schedule is the total order of
execution of operations.
• Given a schedule S comprising of n transactions, say
T1, T2, T3………..Tn; for any transaction Ti, the
operations in Ti must execute as laid down in the
schedule S.
Types of Schedules
• There are two types of schedules −
• Serial Schedules − In a serial schedule, at any point
of time, only one transaction is active, i.e. there is no
overlapping of transactions. This is depicted in the
following graph −
• Parallel Schedules − In parallel schedules, more
than one transactions are active simultaneously, i.e.
the transactions contain operations that overlap at
time. This is depicted in the following graph −
Conflicts in Schedules
• In a schedule comprising of multiple transactions,
a conflict occurs when two active transactions
perform non-compatible operations.
• Two operations are said to be in conflict, when all of
the following three conditions exists simultaneously −
• The two operations are parts of different
transactions.
• Both the operations access the same data item.
• At least one of the operations is a write_item()
operation, i.e. it tries to modify the data item.
Conflict Serializable Schedule
• A schedule is called conflict serializability if after
swapping of non-conflicting operations, it can
transform into a serial schedule.
• The schedule will be a conflict serializable if it is
conflict equivalent to a serial schedule.
Conflicting Operations
• The two operations become conflicting if all
conditions satisfy:
1. Both belong to separate transactions.
2. They have the same data item.
3. They contain at least one write operation.
Conflict Equivalent
• In the conflict equivalent, one can be transformed to
another by swapping non-conflicting operations. In
the given example, S2 is conflict equivalent to S1 (S1
can be converted to S2 by swapping non-conflicting
operations).
• Two schedules are said to be conflict equivalent if
and only if:
1. They contain the same set of the transaction.
2. If each pair of conflict operations are ordered in the
same way.
• Schedule S2 is a serial schedule because, in this, all
operations of T1 are performed before starting any
operation of T2. Schedule S1 can be transformed into
a serial schedule by swapping non-conflicting
operations of S1.
Serializability
• A serializable schedule of ‘n’ transactions is a
parallel schedule which is equivalent to a serial
schedule comprising of the same ‘n’ transactions.
• A serializable schedule contains the correctness of
serial schedule while ascertaining better CPU
utilization of parallel schedule.
Equivalence of Schedules
• Equivalence of two schedules can be of the following
types −
• Result equivalence − Two schedules producing
identical results are said to be result equivalent.
• View equivalence − Two schedules that perform
similar action in a similar manner are said to be view
equivalent.
• Conflict equivalence − Two schedules are said to
be conflict equivalent if both contain the same set of
transactions and has the same order of conflicting
pairs of operations.
Locking Based Concurrency Control Protocols
• Locking-based concurrency control protocols use the
concept of locking data items.
• A lock is a variable associated with a data item that
determines whether read/write operations can be
performed on that data item.
• Generally, a lock compatibility matrix is used which
states whether a data item can be locked by two
transactions at the same time.
May not achieve serializability
In the above example, firstly T1 will perform the operation and
release its exclusive lock on A, then shared lock will be given to T2
and it will perform operation and then again exclusive lock on T1. So,
it is making a loop, making it non-serializability.
May not free from irrecoverability
In the given example,
if T1 get failed in the
end then it will be
rolled back. But T2 has
read a value that is
written by T1 so it
becomes dirty read
that make the value
committed by T2 as
irrecoverable as it is
already committed.
May not free from deadlock
In the given example, T1 has already
acquired X(A) and after some time
T2 is also asking for the exclusive
lock but on B, so it will also be
granted. Transaction T2 is still going
on and after some time T1 has asked
for X(B) that will not be granted so it
will be in the waiting state. Then after
some time, T2 has asked for X(A)
and as it is still hold by the T1 so T2
will go to the waiting state. As both
the transactions are in the waiting
state, so it is known as a deadlock
state (waiting for infinite time)
May not free from starvation
In this example, 4 transactions
are there. T2 started at first and
S(A) has been granted to it. After
few minutes, T1 asked for X(A)
but it will not be granted to T1
and T1 goes to the waiting state.
Before T2 is going to release
S(A), T3 asked for S(A) and it will
be granted to it and similarly after
some time, S(A) will be granted
to T4. So, when all these 3
transactions will release S(A)
then only X(A) will be granted to
T1. This state is starvation
(waiting for finite time)
One-phase Locking Protocol
• In this method, each transaction locks an item before
use and releases the lock as soon as it has finished
using it. This locking method provides for maximum
concurrency but does not always enforce
serializability.
Two-phase Locking Protocol
• In this method, all locking operations precede the first
lock-release or unlock operation. The transaction
comprise of two phases.
• In the first phase, a transaction only acquires all the
locks it needs and do not release any lock. This is
called the expanding or the growing phase.
• In the second phase, the transaction releases the
locks and cannot request any new locks. This is
called the shrinking phase.
2PL
In T1 transaction, different operations are getting performed and T1 is asking for the
different types of locks. In 2PL when any transaction is asking for the locks, it is known
as growing phase. And during growing phase, locks are not released. When T1 started
releasing locks, it become shrinking phase and during shrinking phase no locks are
granted
Advantage of 2PL
As seen in the example, until
T1 is in growing phase, it is
not going to release any lock
if even that lock has been
requested by T2 transaction.
And T2 transaction will only
release all the locks when
that lock is not needed. This
makes the transaction T1
and T2 as serializable.
This is another
example of
serializability in 2PL.
As seen in this
example, T1 initiated
first and acquired
S(A). After some time,
T2 asked for S(A) and
it has been granted
with the same. Now
both are in the
growing phase. When
T1 has unlocked any
lock, that point is
known as lock point.
May not free from irrecoverability
If T1 gets failed in the end, it
will be rolled back. But as T2
is dependent on T1 because
it is reading a value written
by T1, it will not be rolled
back as T2 is committed.
So, it is irrecoverable
schedule
Not free from cascading rollback
As T2, T3 and T4 is
dependent on T1 so if
T1 will get failed, it will
be rolled back along
with all the other
transactions.
Strict 2PL
The example shown is
for cascading rollback.
When T1 will release
the exclusive lock on A,
it will be granted to T2
and T2 will read the
data that is written by
T1. If T1 will get failed, it
will be rolled back and
then the other
transactions will also be
rolled back.
But with strict 2PL, it says a
transaction will only release
Exclusive lock after the commit.
So, when T1 will be committed,
other transactions will read the
data from the database and
transaction will be cascadeless.
This will also be recoverable
schedule.
Advantages of Strict 2PL:
1. Cascadeless
2. Recoverable
Disadvantages:
1. Deadlock
2. Starvation
Timestamp Concurrency Control Algorithms
• Timestamp-based concurrency control algorithms use a
transaction’s timestamp to coordinate concurrent access to a
data item to ensure serializability. A timestamp is a unique
identifier given by DBMS to a transaction that represents the
transaction’s start time.
• These algorithms ensure that transactions commit in the order
dictated by their timestamps. An older transaction should
commit before a younger transaction, since the older transaction
enters the system before the younger one.
• Timestamp-based concurrency control techniques generate
serializable schedules such that the equivalent serial schedule
is arranged in order of the age of the participating transactions.
• Some of timestamp based concurrency control
algorithms are −
• Basic timestamp ordering algorithm.
• Conservative timestamp ordering algorithm.
• Multi-version algorithm based upon timestamp
ordering.
• Timestamp based ordering follow three rules to enforce
serializability −
• Access Rule − When two transactions try to access the same
data item simultaneously, for conflicting operations, priority is
given to the older transaction. This causes the younger
transaction to wait for the older transaction to commit first.
• Late Transaction Rule − If a younger transaction has written a
data item, then an older transaction is not allowed to read or
write that data item. This rule prevents the older transaction from
committing after the younger transaction has already committed.
• Younger Transaction Rule − A younger transaction can read
or write a data item that has already been written by an older
transaction.
Let suppose T1 entered the system at
10:00 and we need to assign some
timestamp to it such as 100 for the
given example. At 10:10 T2 entered the
system, now we must assign a new
timestamp to T2 that must be greater
than previous timestamp. So, we
assigned 200 timestamp to T2 that
shows T2 transaction is the newest
and T1 is the oldest. Similarly, 300
timestamp has been assigned to T3.
TS(Ti) is the timestamp of Ti
transaction. RTS is the read time of the
newest transaction. WTS is the write
time of the newest transaction. So,
RTS of A is 30 and WTS of A is 20.
In 1st case, T1 & T2 are two
transactions, but T2 performed
its operation at first. Check the
rule 2. According to this T1 is
asking for W(A), and if RTS(A)
> TS(Ti) then T2 will be rolled
back.
In the 2nd case, T2 initiated its
operation W(A) at first and
then T1 asks for R(A), check
rule 1. Using that T1 will be
rolled back.
Editor's Notes
Transaction meaning, atm example, then read, write and commit example with A and B
Atomicity is either all or none. Consistency is before and after the transaction, sum of money should be same (before transaction 5000, after transaction of transferring again sum must be 5000). Isolation is converting parallelization to serializability. Durability is changes made in the database are permanent (money is 3000 when last updation is done so it will remain permanent until the next transaction)
Schedule is the order of execution of transaction.
At first, T1 will complete then T2 and T3. Advantage is consistent and no transaction is interfering with other transaction. But disadvantage is waiting time. Eg: People standing in a queue in atm. Another disadvantage is throughput will be decreased. Throughput is no. of transactions per unit time
Multiple transactions will be started at a time. CPU will be shared among all. Eg: Online Banking. In parallel schedule, throughput is increased. So many online transactions at a same time.
In this case T1 failed in the last, so T1 will be rolled back that again will change the value of A written by T2 transaction. So it means the value of A is not recoverable in this case that make this Transaction as irrecoverable
T1, T2, T3 and T4 are concurrently executing. T1 has updated the value of A but this value has not been updated in the database yet. So T2, T3 and T4 will read the value updated by T1 i.e. 50.
Due to some reason T1 got failed so T1 will be rolled back to the initial point. As T2, T3 and T4 are using the value 50 that is no longer existing now so we will forcefully abort T2, T3 and T4. The disadvantage of the cascading schedule is degrading the performance of CPU
We will not allow any other transaction to perform Read operation related to A until T1 commit the ongoing transaction on A. T2 and T3 can perform any other operation instead of operation on A before commit from T1 on A. Cascadeless schedule says there must not be the Read Operation by the other transaction after Write operation in 1st transaction until that transaction is committed.
This is the disadvantage of the cascadeless as it does not say anything about write operation after write operation in both the transactions. T1 has written something and after that T2 has written something on the value A. As same value 100 has been read by both the transactions, if T1 has been failed before commit then we need to rollback the T1 transaction that will again change the value of A to 100. So, change written by T2 will also get lost.
T1 has read out A. But in the meantime, T2 has also initiated transaction by reading, writing and committing the A. So, a new value of A has been read by T1 after some time because of R-W conflict
Both these schedules are serial schedules as firstly T1 will get complete and then T2. And for the 2nd one, T2 will get over and then T1
This is an example of parallel schedule. Now to check the serializability in the schedule, we must find a schedule in which for the same transactions we need to find a serial schedule. So, for the given example, instead of loop in the graph we have to create a graph that will either show t1 -> t2 or t2-> t1. There are 2 methods using which we can find whether creating serializable schedule is possible for the given parallel schedule or not. These are: Conflict serializability and view serializability.
In the given example, R(B) and W(A) are non conflict, so their positions are interchanged as seen in diagram 1. Now we will check for R(B) and R(A). Again non-conflict so interchange as seen on the next page
If for any schedule, it’s conflict pair exists then that schedule is known as serializable schedule
To check this, we have to look for the conflict pairs and have to create a precedence graph. The number of nodes in the graph will be equal to the number of transactions in the schedule that is why we have T1, T2 and T3. Check for the R(x) in T1 as it is the 1st operation acc to the time. Now in T2, check for W(x). So these is no W(x), that means no conflict pair. Now check R(y) in the T3 as it is the next operation. Conflict pair for R(y) will be W(y). Check is there any W(y) in T2 and T1. There is no W(y). So, now move to R(x) in T3 and check for W(x) in T2 and T1. W(x) is there in T1. So draw a line from T3 to pointing towards T1 as we found the first conflict pair.
Now next is R(y) in T2. So check the conflict of R(y) i.e. W(y) in T3 and T1. We got W(y) in T3, so draw a line from T2 to T3. Check R(z) in T2 and the conflict of this will be W(z). So search for W(z) in T1 and T3. We got W(z) in T1, so line from T2 to T1.
Now W(y) is the next operation. For Write operation, there are two conflicts: Read and Write. So now we will check R(y) and W(y) in T1 and T2. There is no conflict. So the next operation is W(z) that means two conflicts will be R(z) and W(z) in T1 and T3. We got both the conflicts in T1, so line will be from T2 to T1 that is already there in the graph. Now we are left with three operations of T1 only and we are left with no other operation in other transactions for the comparison
As we visited all the nodes, so now we need to check for any loop or circle in the graph. Loop pr circle means if we are starting from a node A, so it is possible for us to come back on the same node from that graph. In this example, it is not possible. That is why, this is the conflict serializable and that ultimately means it is serializable and consistent. So if it is serializable that means there exists a serial schedule wrt to this parallel example. So to find this we have to check the indegree
Next job is to find the indegree. Indegree is a node to which none of the other node is coming. So T2 is that node. Remove the T2 node and its vertex from the graph and that means firstly T2 will be done
Next indegree node is T3 as no node is coming to T3. And then we will left with T1. So the serial schedule for the given parallel schedule will be T2->T3->T1
In the given precedence graph, there is a loop between T1 and T2. It means it is not conflict serializable. But is it serializable or not, this answer is not available now. For such type of situations, view serializability is used.
Now we created another schedule with same number of transactions but with a minor change of writing W(A) of T1 before W(A) of T2. This makes second schedule as Serial schedule. Now we will check whether these two schedules are providing the same result or not.
Both the schedules are yielding the same result that is 0 only. So it means there exist a serial schedule for the given parallel schedule that makes the given parallel schedule as a serializable. This was not proved by conflict serializable. But with view serializability, we got the answer.