‹#›
CSE 480: Database Systems
Lecture 24: Concurrency Control
‹#›
Concurrency Control
 Reasons for concurrency control
– Although serial execution of a set of transactions may be correct,
concurrent (interleaved) transactions may be incorrect
 Lost Update Problem
 Temporary Update (or Dirty Read) Problem
 Incorrect Summary Problem
 Nonrepeatable read
– Database recovery from transaction failure or system crash
becomes more complicated if you don’t control the read/write
operations performed by the concurrent transactions
‹#›
Lost update problem
 Lost Update Problem
– When two transactions that update the same database items
have their operations interleaved in a way that makes the value
of some database item incorrect
‹#›
Temporary Update (Dirty Read) problem
 Temporary Update (or Dirty Read) Problem
– When one transaction updates a database item and then the
transaction fails for some reason
– The updated item could be accessed by another transaction
‹#›
Incorrect Summary Problem
 Incorrect Summary Problem
– If a transaction is calculating an aggregate function while others
are updating some of these records, the aggregate function may
calculate some values before they are updated and others after
they are updated
‹#›
Nonrepeatable Read Problem
 Nonrepeatable Read Problem
– If a transaction reads the same data item twice and the item is
changed by another transaction between the two reads
Value of X has
changed
‹#›
Transaction Schedule
 To analyze the problems with concurrent transactions, we
need to examine their transaction schedule
– A transaction schedule is an ordering of database operations
from various concurrently executing transactions
Sa: r1(X) r2(X) w1(X) r1(Y) w2(X) w1(Y) Sb: r1(X) w1(X) r2(X) w2(X) r1(Y)
‹#›
Why Study Transaction Schedule?
 Characteristics of a transaction schedule will determine
– whether it is “easy” to recover from transaction failures
(see slide 10)
– whether concurrent execution of transactions is “correct”
(see slide 11)
‹#›
Characterizing Schedules based on Recoverability
time
T1
T2
T3
T4
T5
T6
c1
c2
c3
c4
a6
What to do when transaction T6 aborts?
- Do we need to rollback transactions that have already committed (e.g., T1..T4)
- Do we need to rollback other uncommitted transactions beside T6?
x x
x
x
x
x
x
x
xx
x x
x
x x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x xxx
x x
‹#›
Characterizing Schedules based on Serializability
Is the effect of executing transactions in
the order shown in schedule D
equivalent to executing transactions in
the order shown in schedule A or B?
‹#›
Example
 Consider the following transaction schedule
r1(X) w1(X) r2(X) r1(Y) r2(Y) c2 w1(Y) a1
– If T1 aborts, do we need to rollback the committed transaction T2?
– Answer: yes
– Schedule is “non-recoverable”
 This type of schedule makes the recovery process more cumbersome
because we have to rollback transactions that have committed
‹#›
Recoverable Schedule
 A schedule where no committed transactions need to be
rolled back
 A transaction T must not commit until all transactions T’
that have written an item that T reads have committed
 Examples:
– r1(X) w1(X) r2(X) r1(Y) w2(X) c2 a1
 Nonrecoverable (T2 must be rolled back when T1 aborts)
– r1(X) r2(X) w1(X) r1(Y) w2(X) c2 w1(Y) a1
 Recoverable (T2 does not have to be rolled back when T1 aborts)
– r2(X) w2(X) r1(X) r1(Y) w1(X) c2 w1(Y) a1
 Recoverable (T2 does not have to be rolled back when T1 aborts)
‹#›
Is Recoverable Schedule Sufficient?
 Example:
r1(X) w1(X) r2(X) w2(X) a1
– Recoverable because T2 has not committed before T1
– But the uncommitted transaction T2 must still be aborted when T1
aborts (cascading rollback)
 Cascadeless schedule
– A schedule with no cascading rollback, i.e., if a transaction T is
aborted, we only need to rollback T and no other transactions
– How do we ensure this?
‹#›
Cascadeless Schedules
 Every transaction in the schedule reads only items
that were written by committed transactions
 Examples:
– r1(X) w1(X) r2(X) r3(X) w2(X) c2 a1
 Must rollback T2 and T3 (Not recoverable, not cascadeless)
– r1(X) r2(X) w1(X) r3(X) w2(X) c2 a1
 Must rollback T3 only (Recoverable, not cascadeless)
– r1(X) r2(X) r3(X) w1(X) c2 a1
 No need to rollback T2 nor T3 (Recoverable, cascadeless)
 Cascadeless schedules are recoverable and avoid
cascading rollback
‹#›
Recovery using System Log
 From previous lecture, if the database system crashes,
we can recover to a consistent database state by
examining the log
 Example of entries in a log record (T: transaction ID)
1. [start_transaction,T4]
2. [read_item,T4.X]
3. [write_item,T4.X,4,11] (before image = 4, after image = 11)
4. [abort,T4]
– During recovery, we may undo the change in T4.X by using its
“before image” (i.e., replace new value 11 with old value 4)
‹#›
Strict Schedules
 But with concurrency, it is not always possible to restore
the database to its original state after abort using the
before image of data item
 Example:
r2(X) r1(X) w1(X) w2(X) a1
– Is the transaction recoverable?
– Is the transaction cascadeless?
– If the original value for X is 5, T1 modifies X to 10 and T2 modifies
it to 8. After we undo the changes of T1, will X returned to a
correct value?
‹#›
Strict Schedules
 A schedule in which we can restore the database to a
consistent state after abort using the before image of data
item
 A schedule in which a transaction can neither read nor
write an item X until the last transaction that wrote X has
committed or aborted
 Example:
r2(X) r1(X) w1(X) w2(X) a1
– Schedule is cascadeless but not strict
‹#›
Characterizing Schedules based on Recoverability
 Summary
– Recoverable schedules: no need to rollback committed
transactions
– Cascadeless schedules: no cascading rollback (rollback only the
aborted transaction)
– Strict schedules: undo changes by aborted transaction by
applying the before image of affected data items
 Cascadeless schedules are recoverable
 Strict schedules are cascadeless and recoverable
 More stringent condition means easier to do recovery
from failure but less concurrency
‹#›
Serial Schedules
 A schedule S is serial if all operations in transactions are
executed consecutively in the schedule
– Otherwise, it is called nonserial schedule
Serial: r1(X) w1(X) r1(Y) w1(Y) r2(X) w2(X) Nonserial: r1(X) w1(X) r2(X) w2(X) r1(Y) w1(Y)
‹#›
Serial Schedules
 Every serial schedule is correct, i.e., leads to a consistent
database state
Schedule A Schedule B
read_item(X) read_item(X)
X = X + 3 X = X – 2Y
write_item(X) write_item(X)
read_item(X) read_item(X)
X = X – 2Y X = X + 3
write_item(X) write_item(X)
– But executions of serial schedules are highly inefficient (because
there is no concurrency)
‹#›
Serializable Schedules
 A schedule S is serializable if its execution is equivalent
to some serial schedule of the same transactions
– Otherwise, S is non-serializable
 We consider a special type of serializable schedule called
conflict serializable schedule
‹#›
Conflict Serializable
 A schedule S is said to be conflict serializable if it is
conflict equivalent to some serial schedule of the same
transactions
– Two schedules are said to be conflict equivalent if the order of
any two conflicting operations is the same in both schedules
– Two operations in a transaction schedule are in conflict if
 They belong to different transactions
 They access the same data item
 At least one of them is a write operation
 If a schedule is conflict serializable, we can reorder the
nonconflicting operations until we form an equivalent
serial schedule
‹#›
Testing for Conflict Serializability
 Construct a precedence graph (serialization graph) where
– Nodes are the transactions
– A directed edge is created from Ti to Tj if one of the operations
in Ti appears before a conflicting operation in Tj
 Create edge Ti  Tj if schedule contains wi(X) rj(X)
 Create edge Ti  Tj if schedule contains ri(X) wj(X)
 Create edge Ti  Tj if schedule contains wi(X) wj(X)
 A schedule is conflict serializable if and only if the
precedence graph has no cycles.
‹#›
Example
‹#›
Example
This schedule is non-serializable because precedence graph has a cycle
‹#›
Example
This schedule is conflict serializable because precedence graph has no cycle
‹#›
Equivalent Serial Schedule
 If a schedule S is conflict serializable, we can create an
equivalent serial schedule S’ as follows:
– Whenever an edge exists in the precedence graph from Ti to Tj, Ti
must appear before Tj in the equivalent serial schedule
– Schedule A is the equivalent serial schedule for schedule D
Precedence graph for schedule D
‹#›
Example
Is it conflict serializable?
What is the equivalent
serial schedule?
‹#›
Example
Is it conflict serializable?
What is the equivalent
serial schedule?
‹#›
Characterizing Schedules based on Serializability
 Summary
– Serial schedule is inefficient (no parallelism)
– Serializable schedule gives benefits of concurrent executions
without giving up correctness
 Concurrency control subsystem of DBMS must use
certain protocol to ensure serializability of all schedules in
which the transactions participate
– 2-Phase locking protocol (Chapter 22)
– May cause deadlocks
– DBMS will automatically abort one of the transactions, releasing
the locks for other transactions to continue
‹#›
MySQL Example
Client 1: Client 2:
Consider two concurrent
transactions
‹#›
MySQL Example (Deadlock)
Client 1: Client 2:
‹#›
MySQL Example (Deadlock)
Client 1: Client 2:
Client 1 will be kept busy waiting
Deadlock detected by concurrency
control module of DBMS;
Transaction for client 2 is aborted,
allowing transaction for client 1 to
continue

concurrencycontrol_databasemanagement_system

  • 1.
    ‹#› CSE 480: DatabaseSystems Lecture 24: Concurrency Control
  • 2.
    ‹#› Concurrency Control  Reasonsfor concurrency control – Although serial execution of a set of transactions may be correct, concurrent (interleaved) transactions may be incorrect  Lost Update Problem  Temporary Update (or Dirty Read) Problem  Incorrect Summary Problem  Nonrepeatable read – Database recovery from transaction failure or system crash becomes more complicated if you don’t control the read/write operations performed by the concurrent transactions
  • 3.
    ‹#› Lost update problem Lost Update Problem – When two transactions that update the same database items have their operations interleaved in a way that makes the value of some database item incorrect
  • 4.
    ‹#› Temporary Update (DirtyRead) problem  Temporary Update (or Dirty Read) Problem – When one transaction updates a database item and then the transaction fails for some reason – The updated item could be accessed by another transaction
  • 5.
    ‹#› Incorrect Summary Problem Incorrect Summary Problem – If a transaction is calculating an aggregate function while others are updating some of these records, the aggregate function may calculate some values before they are updated and others after they are updated
  • 6.
    ‹#› Nonrepeatable Read Problem Nonrepeatable Read Problem – If a transaction reads the same data item twice and the item is changed by another transaction between the two reads Value of X has changed
  • 7.
    ‹#› Transaction Schedule  Toanalyze the problems with concurrent transactions, we need to examine their transaction schedule – A transaction schedule is an ordering of database operations from various concurrently executing transactions Sa: r1(X) r2(X) w1(X) r1(Y) w2(X) w1(Y) Sb: r1(X) w1(X) r2(X) w2(X) r1(Y)
  • 8.
    ‹#› Why Study TransactionSchedule?  Characteristics of a transaction schedule will determine – whether it is “easy” to recover from transaction failures (see slide 10) – whether concurrent execution of transactions is “correct” (see slide 11)
  • 9.
    ‹#› Characterizing Schedules basedon Recoverability time T1 T2 T3 T4 T5 T6 c1 c2 c3 c4 a6 What to do when transaction T6 aborts? - Do we need to rollback transactions that have already committed (e.g., T1..T4) - Do we need to rollback other uncommitted transactions beside T6? x x x x x x x x xx x x x x x x x x x x x x xx x x x x x x x x x xxx x x
  • 10.
    ‹#› Characterizing Schedules basedon Serializability Is the effect of executing transactions in the order shown in schedule D equivalent to executing transactions in the order shown in schedule A or B?
  • 11.
    ‹#› Example  Consider thefollowing transaction schedule r1(X) w1(X) r2(X) r1(Y) r2(Y) c2 w1(Y) a1 – If T1 aborts, do we need to rollback the committed transaction T2? – Answer: yes – Schedule is “non-recoverable”  This type of schedule makes the recovery process more cumbersome because we have to rollback transactions that have committed
  • 12.
    ‹#› Recoverable Schedule  Aschedule where no committed transactions need to be rolled back  A transaction T must not commit until all transactions T’ that have written an item that T reads have committed  Examples: – r1(X) w1(X) r2(X) r1(Y) w2(X) c2 a1  Nonrecoverable (T2 must be rolled back when T1 aborts) – r1(X) r2(X) w1(X) r1(Y) w2(X) c2 w1(Y) a1  Recoverable (T2 does not have to be rolled back when T1 aborts) – r2(X) w2(X) r1(X) r1(Y) w1(X) c2 w1(Y) a1  Recoverable (T2 does not have to be rolled back when T1 aborts)
  • 13.
    ‹#› Is Recoverable ScheduleSufficient?  Example: r1(X) w1(X) r2(X) w2(X) a1 – Recoverable because T2 has not committed before T1 – But the uncommitted transaction T2 must still be aborted when T1 aborts (cascading rollback)  Cascadeless schedule – A schedule with no cascading rollback, i.e., if a transaction T is aborted, we only need to rollback T and no other transactions – How do we ensure this?
  • 14.
    ‹#› Cascadeless Schedules  Everytransaction in the schedule reads only items that were written by committed transactions  Examples: – r1(X) w1(X) r2(X) r3(X) w2(X) c2 a1  Must rollback T2 and T3 (Not recoverable, not cascadeless) – r1(X) r2(X) w1(X) r3(X) w2(X) c2 a1  Must rollback T3 only (Recoverable, not cascadeless) – r1(X) r2(X) r3(X) w1(X) c2 a1  No need to rollback T2 nor T3 (Recoverable, cascadeless)  Cascadeless schedules are recoverable and avoid cascading rollback
  • 15.
    ‹#› Recovery using SystemLog  From previous lecture, if the database system crashes, we can recover to a consistent database state by examining the log  Example of entries in a log record (T: transaction ID) 1. [start_transaction,T4] 2. [read_item,T4.X] 3. [write_item,T4.X,4,11] (before image = 4, after image = 11) 4. [abort,T4] – During recovery, we may undo the change in T4.X by using its “before image” (i.e., replace new value 11 with old value 4)
  • 16.
    ‹#› Strict Schedules  Butwith concurrency, it is not always possible to restore the database to its original state after abort using the before image of data item  Example: r2(X) r1(X) w1(X) w2(X) a1 – Is the transaction recoverable? – Is the transaction cascadeless? – If the original value for X is 5, T1 modifies X to 10 and T2 modifies it to 8. After we undo the changes of T1, will X returned to a correct value?
  • 17.
    ‹#› Strict Schedules  Aschedule in which we can restore the database to a consistent state after abort using the before image of data item  A schedule in which a transaction can neither read nor write an item X until the last transaction that wrote X has committed or aborted  Example: r2(X) r1(X) w1(X) w2(X) a1 – Schedule is cascadeless but not strict
  • 18.
    ‹#› Characterizing Schedules basedon Recoverability  Summary – Recoverable schedules: no need to rollback committed transactions – Cascadeless schedules: no cascading rollback (rollback only the aborted transaction) – Strict schedules: undo changes by aborted transaction by applying the before image of affected data items  Cascadeless schedules are recoverable  Strict schedules are cascadeless and recoverable  More stringent condition means easier to do recovery from failure but less concurrency
  • 19.
    ‹#› Serial Schedules  Aschedule S is serial if all operations in transactions are executed consecutively in the schedule – Otherwise, it is called nonserial schedule Serial: r1(X) w1(X) r1(Y) w1(Y) r2(X) w2(X) Nonserial: r1(X) w1(X) r2(X) w2(X) r1(Y) w1(Y)
  • 20.
    ‹#› Serial Schedules  Everyserial schedule is correct, i.e., leads to a consistent database state Schedule A Schedule B read_item(X) read_item(X) X = X + 3 X = X – 2Y write_item(X) write_item(X) read_item(X) read_item(X) X = X – 2Y X = X + 3 write_item(X) write_item(X) – But executions of serial schedules are highly inefficient (because there is no concurrency)
  • 21.
    ‹#› Serializable Schedules  Aschedule S is serializable if its execution is equivalent to some serial schedule of the same transactions – Otherwise, S is non-serializable  We consider a special type of serializable schedule called conflict serializable schedule
  • 22.
    ‹#› Conflict Serializable  Aschedule S is said to be conflict serializable if it is conflict equivalent to some serial schedule of the same transactions – Two schedules are said to be conflict equivalent if the order of any two conflicting operations is the same in both schedules – Two operations in a transaction schedule are in conflict if  They belong to different transactions  They access the same data item  At least one of them is a write operation  If a schedule is conflict serializable, we can reorder the nonconflicting operations until we form an equivalent serial schedule
  • 23.
    ‹#› Testing for ConflictSerializability  Construct a precedence graph (serialization graph) where – Nodes are the transactions – A directed edge is created from Ti to Tj if one of the operations in Ti appears before a conflicting operation in Tj  Create edge Ti  Tj if schedule contains wi(X) rj(X)  Create edge Ti  Tj if schedule contains ri(X) wj(X)  Create edge Ti  Tj if schedule contains wi(X) wj(X)  A schedule is conflict serializable if and only if the precedence graph has no cycles.
  • 24.
  • 25.
    ‹#› Example This schedule isnon-serializable because precedence graph has a cycle
  • 26.
    ‹#› Example This schedule isconflict serializable because precedence graph has no cycle
  • 27.
    ‹#› Equivalent Serial Schedule If a schedule S is conflict serializable, we can create an equivalent serial schedule S’ as follows: – Whenever an edge exists in the precedence graph from Ti to Tj, Ti must appear before Tj in the equivalent serial schedule – Schedule A is the equivalent serial schedule for schedule D Precedence graph for schedule D
  • 28.
    ‹#› Example Is it conflictserializable? What is the equivalent serial schedule?
  • 29.
    ‹#› Example Is it conflictserializable? What is the equivalent serial schedule?
  • 30.
    ‹#› Characterizing Schedules basedon Serializability  Summary – Serial schedule is inefficient (no parallelism) – Serializable schedule gives benefits of concurrent executions without giving up correctness  Concurrency control subsystem of DBMS must use certain protocol to ensure serializability of all schedules in which the transactions participate – 2-Phase locking protocol (Chapter 22) – May cause deadlocks – DBMS will automatically abort one of the transactions, releasing the locks for other transactions to continue
  • 31.
    ‹#› MySQL Example Client 1:Client 2: Consider two concurrent transactions
  • 32.
  • 33.
    ‹#› MySQL Example (Deadlock) Client1: Client 2: Client 1 will be kept busy waiting Deadlock detected by concurrency control module of DBMS; Transaction for client 2 is aborted, allowing transaction for client 1 to continue