Advertisement
Advertisement

More Related Content

Advertisement

transaction_processing.ppt

  1. TransactionProcessing
  2. Transaction Processing Systems • Transaction Processing Systems are the systems with large databases and hundreds of concurrent users executing database transactions. • For example airline reservations, banking, stock markets, etc.
  3. Transaction • A transaction is an executing programthat formsalogical unitofdatabaseprocessing. • A transaction includes one or more database access operations- these can include insertion, deletion, modification, orretrieval operations. • Transaction is executed as a single unit. It is a program unit whose execution may or may not changethecontentsofadatabase.
  4. Example • A transfer of money from one bank account to another requires two changes to the database both must succeed or fail together. – Subtracting the money from the savings account balance. – Adding the money to the checking account balance.
  5. Example
  6. 5000 5000 ProcessesofTransaction • Read Operation: To read a database object, it is first brought into main memory from disk and thenits value is copiedintoa program variable. Read(2) A MainMemory (RAM) INPUT (1) a Program Variable 5000 A Disk Physical Block (X)
  7. 5000to7000 7000 ProcessesofTransaction • Write Operation: To write a database object, an in-memory copy of the object is first modifiedandthenwrittenbacktodisk. Write(1) A MainMemory (RAM) OUTPUT(2) a Program Variable 5000to7000 A Disk Physical Block (X)
  8. DesirablePropertiesofTransactions ACIDproperties: • Atomicity: A transaction is an atomic unit of processing; it is either performed in its entirety or not performedatall. • Consistency preservation: A correct execution of the transaction must take the database from one consistentstatetoanother. • Isolation: A transaction should appear as though it is being executed in isolation from other transactions, even if many transactions are executing concurrently. That is, the execution of a transaction should not be interferedwithanyotherexecutingtransactions. • Durability or permanency: Once a transaction changes the database and the changes are committed, these changesmustneverbelostbecauseofanyfailure.
  9. TransactionProperties • Let T1 be a transaction that transfers Rs 50 from account A to account B. This transaction can be definedas: • SayvalueofApriortotransactionwas1000and that ofBwas2000
  10. • Atomicity: Suppose that during the execution of T1, a power failure has occurred that prevented the T1 to complete successfully. The point of failure may be after the completion of Write(A) and beforeWrite(B). It meansthat the changes in A are performed but not in B. Thus the values of account A and B are Rs.950 and Rs.2000 respectively. WehavelostRs.50asaresultofthisfailure. • Now, ourdatabaseisininconsistentstate. • Thereasonforthisinconsistentstateisthatourtransaction is completedpartially. • In order to maintain atomicity of transaction, the database system keeps track of the old values of any write and if the transaction does not complete its execution, the old values are restored to make it appear as the transaction never executed.
  11. • Consistency: The consistency requirement here is that the sum of A and B must be unchanged by the executionofthetransaction. Itcanbeverified easily that, if the database is consistent before an execution of the transaction, the database remains consistent after the execution of the transaction. • Ensuring consistency for an individual transaction is the responsibility of the application programmerwhocodesthetransaction.
  12. • Isolation: If several transactions are executed concurrently (or in parallel), then each transaction must behave as if it was executed in isolation. It means that concurrent execution does not result an inconsistent state. • For example, consider another transaction T2, which hastodisplay thesumof accountAand B. Then, itsresultshouldbeRs.3000.
  13. TransactionProperties • Lets supposethat T1andT2performconcurrently, theirscheduleisshownbelow: T1 T2 Status Read(A, a) ValueofAi.e. 1000iscopiedtolocal variablea a: a-50 Local variablea=950 Write(A, a) Valueoflocal variablea950iscopiedtodatabaseitemA Read(A, a) ValueofdatabaseitemA950iscopiedtoa Read(B, b) ValueofdatabaseitemB2000iscopiedtob Display(a+b) 2950isdisplayedassumofaccountsAandB Read(B, b) ValueofdataitemB2000iscopiedtovocal variableb b: =b+50 Local variableb=2050 Write(B, b) Valueoflocal variableb2050iscopiedtodatabaseitemB
  14. • Durability: Once the execution of the transaction completes successfully, and the user who initiated the transaction has been notified that the transfer of funds has taken place, it must be the case that no system failure will result in a loss of data correspondingtothistransferoffunds.
  15. Transaction States • Active state– the initial state; the transaction stays in this state while it is executing . • Partially committed state– after the final statement has been executed. • Failed state -- after the discovery that normal execution can no longer proceed. • Aborted state– after the transaction has been rolled back and the database restored to its state prior to the start of the transaction. Two options after it has been aborted: – restart the transaction can be done only if no internal logical error – kill the transaction • Committed state – after successful completion
  16. StateTransitionDiagram
  17. ConcurrentExecutions • Multiple transactions are concurrentlyinthesystem. allowed to run • Advantages are: – increased processor and disk utilization, leading to better transaction throughput • E.g. one transaction can be using the CPU while another is reading from or writing to the disk – reduced average response time for transactions: short transactions need not wait behind long ones. • Concurrency control schemes – mechanisms to achieve isolation that is, to control the interaction among the concurrent transactions in order to prevent them from destroying the consistency of the database
  18. Why Concurrency Control is needed? • The Lost Update Problem – This occurs when two transactions that access the same database items have their operations interleaved in a way that makes the value of some database item incorrect. • The Temporary Update (or Dirty Read) Problem – This occurs when one transaction updates a database item and then the transaction fails for some reason – The updated item is accessed by another transaction before it is changed back to its original value. • The Incorrect Summary Problem – If one transaction is calculating an aggregate summary function on a number of records while other transactions are updating some of these records, the aggregate function may calculate some values before they are updated and others after they are updated.
  19. Concurrentexecutionisuncontrolled: (a)Thelostupdateproblem.
  20. Concurrentexecutionisuncontrolled: (b)Thetemporaryupdateproblem.
  21. Concurrentexecutionisuncontrolled: (c)Theincorrectsummaryproblem.
  22. Schedules and Conflicts • In a system with a number of simultaneous transactions, a schedule is the total order of execution of operations. • Given a schedule S comprising of n transactions, say T1, T2, T3………..Tn; for any transaction Ti, the operations in Ti must execute as laid down in the schedule S.
  23. Types of Schedules • There are two types of schedules − • Serial Schedules − In a serial schedule, at any point of time, only one transaction is active, i.e. there is no overlapping of transactions. This is depicted in the following graph −
  24. • Parallel Schedules − In parallel schedules, more than one transactions are active simultaneously, i.e. the transactions contain operations that overlap at time. This is depicted in the following graph −
  25. Recoverability
  26. Cascading
  27. Cascadeless
  28. Conflicts in Schedules • In a schedule comprising of multiple transactions, a conflict occurs when two active transactions perform non-compatible operations. • Two operations are said to be in conflict, when all of the following three conditions exists simultaneously − • The two operations are parts of different transactions. • Both the operations access the same data item. • At least one of the operations is a write_item() operation, i.e. it tries to modify the data item.
  29. Conflict Serializable Schedule • A schedule is called conflict serializability if after swapping of non-conflicting operations, it can transform into a serial schedule. • The schedule will be a conflict serializable if it is conflict equivalent to a serial schedule.
  30. Conflicting Operations • The two operations become conflicting if all conditions satisfy: 1. Both belong to separate transactions. 2. They have the same data item. 3. They contain at least one write operation.
  31. Example
  32. Conflict Equivalent • In the conflict equivalent, one can be transformed to another by swapping non-conflicting operations. In the given example, S2 is conflict equivalent to S1 (S1 can be converted to S2 by swapping non-conflicting operations). • Two schedules are said to be conflict equivalent if and only if: 1. They contain the same set of the transaction. 2. If each pair of conflict operations are ordered in the same way.
  33. • Schedule S2 is a serial schedule because, in this, all operations of T1 are performed before starting any operation of T2. Schedule S1 can be transformed into a serial schedule by swapping non-conflicting operations of S1.
  34. Serializability • A serializable schedule of ‘n’ transactions is a parallel schedule which is equivalent to a serial schedule comprising of the same ‘n’ transactions. • A serializable schedule contains the correctness of serial schedule while ascertaining better CPU utilization of parallel schedule.
  35. Serial schedules
  36. Parallel Schedules
  37. Another example of parallel schedule and it’s possible serial schedules
  38. Conflict Serializability
  39. Check the adjacent pairs for conflict. If these are non conflict pairs then interchange their positions
  40. Now S and S1 and conflict equivalent schedules
  41. No swap possible in this case
  42. Check a schedule is serializable or not using precedence graph
  43. View Serializability
  44. Equivalence of Schedules • Equivalence of two schedules can be of the following types − • Result equivalence − Two schedules producing identical results are said to be result equivalent. • View equivalence − Two schedules that perform similar action in a similar manner are said to be view equivalent. • Conflict equivalence − Two schedules are said to be conflict equivalent if both contain the same set of transactions and has the same order of conflicting pairs of operations.
  45. Locking Based Concurrency Control Protocols • Locking-based concurrency control protocols use the concept of locking data items. • A lock is a variable associated with a data item that determines whether read/write operations can be performed on that data item. • Generally, a lock compatibility matrix is used which states whether a data item can be locked by two transactions at the same time.
  46. • Locking-based concurrency control systems can use either one-phase or two-phase locking protocols.
  47. Shared and Exclusive Locking Protocols
  48. May not achieve serializability In the above example, firstly T1 will perform the operation and release its exclusive lock on A, then shared lock will be given to T2 and it will perform operation and then again exclusive lock on T1. So, it is making a loop, making it non-serializability.
  49. May not free from irrecoverability In the given example, if T1 get failed in the end then it will be rolled back. But T2 has read a value that is written by T1 so it becomes dirty read that make the value committed by T2 as irrecoverable as it is already committed.
  50. May not free from deadlock In the given example, T1 has already acquired X(A) and after some time T2 is also asking for the exclusive lock but on B, so it will also be granted. Transaction T2 is still going on and after some time T1 has asked for X(B) that will not be granted so it will be in the waiting state. Then after some time, T2 has asked for X(A) and as it is still hold by the T1 so T2 will go to the waiting state. As both the transactions are in the waiting state, so it is known as a deadlock state (waiting for infinite time)
  51. May not free from starvation In this example, 4 transactions are there. T2 started at first and S(A) has been granted to it. After few minutes, T1 asked for X(A) but it will not be granted to T1 and T1 goes to the waiting state. Before T2 is going to release S(A), T3 asked for S(A) and it will be granted to it and similarly after some time, S(A) will be granted to T4. So, when all these 3 transactions will release S(A) then only X(A) will be granted to T1. This state is starvation (waiting for finite time)
  52. One-phase Locking Protocol • In this method, each transaction locks an item before use and releases the lock as soon as it has finished using it. This locking method provides for maximum concurrency but does not always enforce serializability.
  53. Two-phase Locking Protocol • In this method, all locking operations precede the first lock-release or unlock operation. The transaction comprise of two phases. • In the first phase, a transaction only acquires all the locks it needs and do not release any lock. This is called the expanding or the growing phase. • In the second phase, the transaction releases the locks and cannot request any new locks. This is called the shrinking phase.
  54. 2PL In T1 transaction, different operations are getting performed and T1 is asking for the different types of locks. In 2PL when any transaction is asking for the locks, it is known as growing phase. And during growing phase, locks are not released. When T1 started releasing locks, it become shrinking phase and during shrinking phase no locks are granted
  55. Advantage of 2PL As seen in the example, until T1 is in growing phase, it is not going to release any lock if even that lock has been requested by T2 transaction. And T2 transaction will only release all the locks when that lock is not needed. This makes the transaction T1 and T2 as serializable.
  56. This is another example of serializability in 2PL. As seen in this example, T1 initiated first and acquired S(A). After some time, T2 asked for S(A) and it has been granted with the same. Now both are in the growing phase. When T1 has unlocked any lock, that point is known as lock point.
  57. Problems in 2PL
  58. May not free from irrecoverability If T1 gets failed in the end, it will be rolled back. But as T2 is dependent on T1 because it is reading a value written by T1, it will not be rolled back as T2 is committed. So, it is irrecoverable schedule
  59. Not free from cascading rollback As T2, T3 and T4 is dependent on T1 so if T1 will get failed, it will be rolled back along with all the other transactions.
  60. Strict 2PL The example shown is for cascading rollback. When T1 will release the exclusive lock on A, it will be granted to T2 and T2 will read the data that is written by T1. If T1 will get failed, it will be rolled back and then the other transactions will also be rolled back.
  61. But with strict 2PL, it says a transaction will only release Exclusive lock after the commit. So, when T1 will be committed, other transactions will read the data from the database and transaction will be cascadeless. This will also be recoverable schedule. Advantages of Strict 2PL: 1. Cascadeless 2. Recoverable Disadvantages: 1. Deadlock 2. Starvation
  62. Timestamp Concurrency Control Algorithms • Timestamp-based concurrency control algorithms use a transaction’s timestamp to coordinate concurrent access to a data item to ensure serializability. A timestamp is a unique identifier given by DBMS to a transaction that represents the transaction’s start time. • These algorithms ensure that transactions commit in the order dictated by their timestamps. An older transaction should commit before a younger transaction, since the older transaction enters the system before the younger one. • Timestamp-based concurrency control techniques generate serializable schedules such that the equivalent serial schedule is arranged in order of the age of the participating transactions.
  63. • Some of timestamp based concurrency control algorithms are − • Basic timestamp ordering algorithm. • Conservative timestamp ordering algorithm. • Multi-version algorithm based upon timestamp ordering.
  64. • Timestamp based ordering follow three rules to enforce serializability − • Access Rule − When two transactions try to access the same data item simultaneously, for conflicting operations, priority is given to the older transaction. This causes the younger transaction to wait for the older transaction to commit first. • Late Transaction Rule − If a younger transaction has written a data item, then an older transaction is not allowed to read or write that data item. This rule prevents the older transaction from committing after the younger transaction has already committed. • Younger Transaction Rule − A younger transaction can read or write a data item that has already been written by an older transaction.
  65. Let suppose T1 entered the system at 10:00 and we need to assign some timestamp to it such as 100 for the given example. At 10:10 T2 entered the system, now we must assign a new timestamp to T2 that must be greater than previous timestamp. So, we assigned 200 timestamp to T2 that shows T2 transaction is the newest and T1 is the oldest. Similarly, 300 timestamp has been assigned to T3. TS(Ti) is the timestamp of Ti transaction. RTS is the read time of the newest transaction. WTS is the write time of the newest transaction. So, RTS of A is 30 and WTS of A is 20.
  66. In 1st case, T1 & T2 are two transactions, but T2 performed its operation at first. Check the rule 2. According to this T1 is asking for W(A), and if RTS(A) > TS(Ti) then T2 will be rolled back. In the 2nd case, T2 initiated its operation W(A) at first and then T1 asks for R(A), check rule 1. Using that T1 will be rolled back.

Editor's Notes

  1. Transaction meaning, atm example, then read, write and commit example with A and B
  2. Atomicity is either all or none. Consistency is before and after the transaction, sum of money should be same (before transaction 5000, after transaction of transferring again sum must be 5000). Isolation is converting parallelization to serializability. Durability is changes made in the database are permanent (money is 3000 when last updation is done so it will remain permanent until the next transaction)
  3. Schedule is the order of execution of transaction.
  4. At first, T1 will complete then T2 and T3. Advantage is consistent and no transaction is interfering with other transaction. But disadvantage is waiting time. Eg: People standing in a queue in atm. Another disadvantage is throughput will be decreased. Throughput is no. of transactions per unit time
  5. Multiple transactions will be started at a time. CPU will be shared among all. Eg: Online Banking. In parallel schedule, throughput is increased. So many online transactions at a same time.
  6. In this case T1 failed in the last, so T1 will be rolled back that again will change the value of A written by T2 transaction. So it means the value of A is not recoverable in this case that make this Transaction as irrecoverable
  7. T1, T2, T3 and T4 are concurrently executing. T1 has updated the value of A but this value has not been updated in the database yet. So T2, T3 and T4 will read the value updated by T1 i.e. 50.
  8. Due to some reason T1 got failed so T1 will be rolled back to the initial point. As T2, T3 and T4 are using the value 50 that is no longer existing now so we will forcefully abort T2, T3 and T4. The disadvantage of the cascading schedule is degrading the performance of CPU
  9. We will not allow any other transaction to perform Read operation related to A until T1 commit the ongoing transaction on A. T2 and T3 can perform any other operation instead of operation on A before commit from T1 on A. Cascadeless schedule says there must not be the Read Operation by the other transaction after Write operation in 1st transaction until that transaction is committed.
  10. This is the disadvantage of the cascadeless as it does not say anything about write operation after write operation in both the transactions. T1 has written something and after that T2 has written something on the value A. As same value 100 has been read by both the transactions, if T1 has been failed before commit then we need to rollback the T1 transaction that will again change the value of A to 100. So, change written by T2 will also get lost.
  11. T1 has read out A. But in the meantime, T2 has also initiated transaction by reading, writing and committing the A. So, a new value of A has been read by T1 after some time because of R-W conflict
  12. Both these schedules are serial schedules as firstly T1 will get complete and then T2. And for the 2nd one, T2 will get over and then T1
  13. This is an example of parallel schedule. Now to check the serializability in the schedule, we must find a schedule in which for the same transactions we need to find a serial schedule. So, for the given example, instead of loop in the graph we have to create a graph that will either show t1 -> t2 or t2-> t1. There are 2 methods using which we can find whether creating serializable schedule is possible for the given parallel schedule or not. These are: Conflict serializability and view serializability.
  14. In the given example, R(B) and W(A) are non conflict, so their positions are interchanged as seen in diagram 1. Now we will check for R(B) and R(A). Again non-conflict so interchange as seen on the next page
  15. If for any schedule, it’s conflict pair exists then that schedule is known as serializable schedule
  16. To check this, we have to look for the conflict pairs and have to create a precedence graph. The number of nodes in the graph will be equal to the number of transactions in the schedule that is why we have T1, T2 and T3. Check for the R(x) in T1 as it is the 1st operation acc to the time. Now in T2, check for W(x). So these is no W(x), that means no conflict pair. Now check R(y) in the T3 as it is the next operation. Conflict pair for R(y) will be W(y). Check is there any W(y) in T2 and T1. There is no W(y). So, now move to R(x) in T3 and check for W(x) in T2 and T1. W(x) is there in T1. So draw a line from T3 to pointing towards T1 as we found the first conflict pair.
  17. Now next is R(y) in T2. So check the conflict of R(y) i.e. W(y) in T3 and T1. We got W(y) in T3, so draw a line from T2 to T3. Check R(z) in T2 and the conflict of this will be W(z). So search for W(z) in T1 and T3. We got W(z) in T1, so line from T2 to T1.
  18. Now W(y) is the next operation. For Write operation, there are two conflicts: Read and Write. So now we will check R(y) and W(y) in T1 and T2. There is no conflict. So the next operation is W(z) that means two conflicts will be R(z) and W(z) in T1 and T3. We got both the conflicts in T1, so line will be from T2 to T1 that is already there in the graph. Now we are left with three operations of T1 only and we are left with no other operation in other transactions for the comparison
  19. As we visited all the nodes, so now we need to check for any loop or circle in the graph. Loop pr circle means if we are starting from a node A, so it is possible for us to come back on the same node from that graph. In this example, it is not possible. That is why, this is the conflict serializable and that ultimately means it is serializable and consistent. So if it is serializable that means there exists a serial schedule wrt to this parallel example. So to find this we have to check the indegree
  20. Next job is to find the indegree. Indegree is a node to which none of the other node is coming. So T2 is that node. Remove the T2 node and its vertex from the graph and that means firstly T2 will be done
  21. Next indegree node is T3 as no node is coming to T3. And then we will left with T1. So the serial schedule for the given parallel schedule will be T2->T3->T1
  22. In the given precedence graph, there is a loop between T1 and T2. It means it is not conflict serializable. But is it serializable or not, this answer is not available now. For such type of situations, view serializability is used.
  23. Now we created another schedule with same number of transactions but with a minor change of writing W(A) of T1 before W(A) of T2. This makes second schedule as Serial schedule. Now we will check whether these two schedules are providing the same result or not.
  24. Both the schedules are yielding the same result that is 0 only. So it means there exist a serial schedule for the given parallel schedule that makes the given parallel schedule as a serializable. This was not proved by conflict serializable. But with view serializability, we got the answer.
Advertisement