CS 542 -- Failure Recovery, Concurrency Control
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


CS 542 -- Failure Recovery, Concurrency Control






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

CS 542 -- Failure Recovery, Concurrency Control Presentation Transcript

  • 1. CS 542 Database Management Systems
    Failure Recovery, Concurrency Control
    J Singh
    April 4, 2011
  • 2. Today’s meeting
    The D in ACID: Durability
    The ACI in ACID
    Consistency is specified by users is how they define transactions
    The Database is responsible for Atomicity and Isolation
  • 3. Types of Failures
    Potential sources of failures:
    Power loss, resulting in loss of main-memory state,
    Media failures, resulting in loss of disk state and
    Software errors, resulting in both
    Recovery is based on the concept of transactions.
  • 4. Transactions and Concurrency
    Users submit transactions, and think of each transaction as executing by itself.
    Concurrency is achieved by the DBMS, which interleaves actions (reads/writes of DB objects) of various transactions.
    Each transaction must leave the database in a consistent state if the DB is consistent when the transaction begins.A transaction can end in two different ways:
    commit: successful end, all actions completed,
    abort: unsuccessful end, only some actions executed.
    Issues: effect of interleaving transactions on the database
    System failures (today’s lecture)
    Concurrent transactions (partly today, remainder next week)
  • 5. Transactions, Logging and Recovery
    We studied Query Processing in the last two lectures
    Now, Log Manager and Recovery Manager
    Second part today, Transaction Manager
  • 6. Reminder: Buffer Management
    Page Requests from Higher Levels
    disk page
    free frame
    choice of frame dictated
    by replacement policy
    Data must be in RAM for DBMS to operate on it!
  • 7. Primitive Buffer Operations
    Requests from Transactions
    Read (x,t):
    Input(x) if necessary
    Assign value of x in block to local variable t (in buffer)
    Write (x,t):
    Input(x) if necessary
    Assign value of local variable t (in buffer) to x
    Requests to Disk
    Input (x):
    Transfer block containing x from disk to memory (buffer)
    Output (x):
    Transfer block containing x from buffer to disk
  • 8. Failure Recovery Approaches
    All of the approaches rely on logging – storing a log of changes to the database so it is possible to restore its state. They differ in
    What information is logged,
    The timing of when to force that information to stable storage,
    What the procedure for recovery will be
    The approaches are named after the recovery procedure
    Undo Logging
    The log contains enough information to detect if the transaction was committed and to roll back the state if it was not.
    When recovering after a failure, walk back through the log and undo the effect of all txns that do not have a COMMIT entry in the log
    Other approaches described later
  • 9. Undo Logging
    When executing transactions
    Write the log before writing transaction data and force it to disk
    Make sure to preserve chronological order
    The log contains enough information to detect if the transaction was committed and to roll back the state if it was not.
    When restarting,
    Walk back through the log and undo the effect of all uncommitted txns in the log.
    Challenge: How far back do we need to look?
    Answer: Until the last checkpoint
    Define and implement checkpoints momentarily
  • 10. An Example Transaction
    A = 8
    B = 8
    Transaction T1
    A  2  A
    B  2  B
    Transaction T1:
    Read (A,t); t  t  2
    Write (A,t);
    Read (B,t); t  t  2
    Write (B,t);
    Output (A);
    Output (B);
    State at Failure Point:
    A = 16
    B = 16
    A = 16
    B = 8
    Undo Log Entries
    <T1, start>
    <T1, A, 8>
    <T1, B, 8>
    <T1, Commit>
    Would have been written if the transaction had completed.
    Do we have the info to restore?
  • 11. Execution with Undo Logging
    Forces all log records to disk
    Logging Rule:
    • If a transaction commits, the commit record must be written to disk after all data records have been written to disk
  • Recovery with Undo Logging
    Consider all uncommitted transactions, starting with the most recent one and going backward.
    Undo all actions of these transactions.
    Why going backward, not forward?
    Example: T1, T2 and T3 all write A
    T1 executed before T2 before T3
    T1 committed, T2 and T3 incomplete
    T1 write A
    T2 write A
    T3 write A
    T1 commit
  • 12. More on Undo Logging
    Failure During Recovery
    Recovery algorithm is idempotent
    Just do it again!
    How much of the log file needs to be processed?
    In principle, we need to examine the entire log.
    Checkpointing limits the part of the log that needs to be considered during recovery up to a certain point (checkpoint).
  • 13. Quiescent Checkpointing
    Simple approach to introduce the concept
    Pause the database
    stop accepting new transactions,
    wait until all current transactions commit or abort and have written the corresponding log records,
    flush the log to disk,
    write a <CKPT> log record and flush the log,
    resume accepting new transactions.
    Once we encounter a checkpoint record, we know that there are no incomplete transactions.
    Do not need to go backward beyond checkpoint.
    Can afford to throw away any part of the log prior to the checkpoint
    Pausing the database may not be warranted for business reasons
  • 14. Non-quiescentCheckpointing
    Main idea: Start- and End-Checkpoints to bracket unfinished txns
    Write a <START CKPT (T1, T2, … Tk)> record into the log
    T1, T2, … Tk are the unfinished txns
    Wait till T1, T2, … Tk commit or abort, but allow other txns to begin
    Write a <END CKPT> record into the log
    Recovery method: scan the log backwards until a <CKPT> record is found
    If <END…>, scan backwards to the previous <START…>
    No need to look any further
    If <START…>, then crash must have occurred during checkpointing.
    The START record tells us unfinished txns and
    Scan back to the beginning of the oldest one of these.
  • 15. Issues with Undo Logging
    Bottlenecks on I/O
    All log records must be forced back to disk before any data written back
    All data records must be forced to disk before the COMMIT record is written back
    An alternative: Redo Logging
    Instead of scanning backward from the end
    Undoing all transactions that were not completed
    Scans the log forward
    Reapplies all transactions that were not completed
  • 16. Logging with Redo Logs
    Creation of the Redo log
    For every action, generate redo log record.
    <T, X, v> has different meaning: v is the new value, not old
    Flush log at commit.
    All log records for transaction that modified X (including commit) must be on disk before X is modified on disk
    Write END log record after DB modifications have been written to disk.
    Recovery algorithm.
    Redo the modifications by committed transactions not yet flushed to disk.
    S = set of txns with <Ti commit> and no <Ti end> in log
    For each <Ti X, v> in log, in forward order (from earliest to latest) do:
    if Ti in S then
    Write(X, v)
    Write <Ti END>
  • 17. Logging with Redo Logs
  • 18. Comments on Redo Logging
    Checkpoint algorithms similar to those for Undo Logging
    Quiescent as well as Non-quiescent algorithms
    Issues with Redo Logging
    Writing data back to disk is not allowed until transaction logs have been written out
    Results in a large requirement for memory for buffer pool
    A flaw in the checkpointing algorithms (textbook, p869)
    Both undo and redo logs may put contradictory requirements on how buffers are handled during a checkpoint, unless the database elements are complete blocks or sets of blocks.
    For instance, if a buffer contains one database element A that was changed by a committed transaction and another database element B that was changed in the same buffer by a transaction that has not yet had its COMMIT record written to disk, then we are required to copy the buffer to disk because of A but also forbidden to do so, because rule R1 applies to B.
  • 19. Undo/Redo Logging (p1)
    Undo logging requires to write modifications to disk immediately after commit, leading to an unnecessarily large number of IOs.
    Redo logging requires to keep all modified blocks in the buffer until the transaction commits and the log records have been flushed, increasing the buffer size requirement.
    Undo/redo logging combines undo and redo logging.
    It provides more flexibility in flushing modified blocks at the expense of maintaining more information in the log.
  • 20. Undo/Redo Logging (p2)
    Main idea: The log can be used to reconstruct the data
    Update records <T, X, new, old> record new and old value of X.
    The only undo/redo logging rule is:
    Log record must be flushed before corresponding modified block
    Also known as write ahead logging.
    Block of X can be flushed before or after T commits, i.e. before or after the COMMIT log record.
    Flush the log at commit.
  • 21. Undo/Redo Logging (p3)
    Because of the flexibility of flushing X before or after the COMMIT record, we can have uncommitted transactions with modifications on disk and committed transactions with modifications not yet on disk.
    The undo/redo recovery policy is as follows:
    Redo committed transactions.
    Undo uncommitted transactions.
  • 22. Undo/Redo Logging Recovery
    More details on the recovery procedure:
    Backward pass
    From end of log back to latest valid checkpoint, construct set S of committed transactions.
    Undo actions of transactions not in S.
    Forward pass
    From latest checkpoint forward to end of log,
    Or from the beginning of time, if there are no checkpoints
    redo actions of transactions in S.
    Alternatively, can also perform the redos before the undos.
  • 23. Undo/Redo Checkpointing
    Write "start checkpoint" listing all active transactions to log
    Flush log to disk
    Write to disk all dirty buffers (contain a changed DB element), whether or not transaction has committed
    Implies nothing should be written (not even to memory buffers) until we are sure the transaction will not abort
    Implies some log records may need to be written to disk (WAL)
    Write "end checkpoint" to log
    Flush log to disk
    start ckpt
    active T's:
  • 24. Protecting Against Media Failure
    Logging protects from local loss of main memory and disk content, but not against global loss of secondary storage content (media failure).
    To protect against media failures, employ archiving: maintaining a copy of the database on a separate, secure storage device.
    Log also needs to be archived in the same manner.
    Two levels of archiving:
    full dump vs. incremental dump.
  • 25. Protecting Against Media Failure
    Typically, database cannot be shut down for the period of time needed to make a backup copy (dump).
    Need to perform nonquiescent archiving, i.e., create a dump while the DBMS continues to process transactions.
    Goal is to make copy of database at time when the dump began, but transactions may change database content during the dumping.
    Logging continues during the dumping, and discrepancies can be corrected from the log.
  • 26. Protecting Against Media Failure
    We assume undo/redo (or redo) logging.
    The archiving procedure is as follows:
    Write a log record <START DUMP>.
    Perform a checkpoint for the log.
    Perform a (full / incremental) dump on the secure storage device.
    Make sure that enough of the log has been copied to the secure storage device so that at least the log up to the check point will survive media failure.
    Write a log record <END DUMP>.
  • 27. Protecting Against Media Failure
    After a media failure, we can restore the DB from the archived DB and archived log as follows:
    Copy latest full dump (archive) back to DB.
    Starting with the earliest ones, make the modifications recorded in the incremental dump(s) in increasing order of time.
    Further modify DB using the archived log.
    Use the recovery method corresponding to the chosen type of logging.
  • 28. Summary
    Logging is an effective way to prepare for system failure
    Transactions provide a useful building block on which to base log entries
    Three type of logs
    Undo Logs
    Redo Logs
    Undo/Redo logs
    Only Undo/Redo logs are used in practice. Why?
    Periodic checkpoints are necessary for keeping recovery times under control. Why?
    Database Dumps (archives) protect against media failure
    Great for making a “point in time” copy of the database.
  • 29. On the NoSQL Front…
    Google Datastore
    Recently (1/2011) added a “High Replication” option.
    Replicates the datastore synchronously across multiple data centers
    Does not use an append-only log
    Has performance and size impact
    Append-only log that’s actually a b-tree
    No provision for deleting part of the log
    Provision for ‘compacting the log’
    Recently (12/2010) added a --journal option
    Has performance impact, no measurements available
    Common thread, tradeoff between performance and durability!
  • 30. CS 542 Database Management Systems
    Concurrency Control
    J Singh
    April 4, 2011
  • 31. Concurrency Control
    Goal: Preserving Data Integrity
    Challenge: enforce ACID rules (while maintaining maximum traffic through the system)
    Committed transactions leave the system in a consistent state
    Rolled-back transactions behave as if they never happened!
    Historical Note
    Based on The Transaction Concept: Virtues and Limitations by Jim Gray, Tandem Computers, 1981
    ACM Turing Award, 1998
  • 32. Transactions
    Concurrent execution of user programs is essential for good DBMS performance.
    Because disk accesses are frequent, and relatively slow, it is important to keep the cpu humming by working on several user programs concurrently.
    A user’s program may carry out many operations on the data retrieved from the database, but the DBMS is only concerned about what data is read/written from/to the database.
    A transaction is the DBMS’s abstract view of a user program: a sequence of reads and writes.
    Referred to as a Schedule
    Implemented by a Transaction Scheduler
  • 33. Scheduler
    Scheduler takes read/write requests from transactions
    Either executes them in buffers or delays them
    Scheduler must avoid Isolation Anomalies
  • 34. Isolation Anomalies (p1)
    Dirty Read – data of an uncommitted transaction visible to others
    Sometimes called WR Conflict
    Non-repeatable Read – some previously read data changes due to another transaction committing
    Sometimes called RW Conflict
    T1: R(A), W(A), R(B), W(B), C
    T2: R(A), W(A), R(B), W(B), C
    T1: R(A), W(A), C
    T2: R(A), W(A), C
  • 35. Isolation Anomalies (p2)
    Overwriting Uncommitted Data
    Sometimes called WW Conflicts
    We need a set of rules to prohibit such isolation anomalies
    The rules place constraints on the actions of concurrent transactions
    T1: W(A), W(B), C
    T2: W(A), W(B), C
  • 36. Serial Schedules
    Definition: A schedule is a list of actions, (i.e. reading, writing, aborting, committing), from a set of transactions.
    A schedule is serial if its transactions are not interleaved
    Serial schedules observe ACI properties
    Schedule D is the set of 3 transactions T1, T2, T3.
    T1 Reads and writes to object X
    Then T2 Reads and writes to object Y
    ThenT3 Reads and writes to object Z.
    D is an example of a serial schedule, because the 3 txns are not interleaved.
    R1 (X), W1(X),R2 (Y), W2(Y), R3 (Z), W3(Z)
  • 37. Serializable Schedules
    Aserializable schedule is one that is equivalent to a serial schedule.
    The Transaction Manager should defer some transactions if the current schedule is not serializable
    The order of transactions in E is not the same as in D,
    But E gives the same result.
    E = R1 (X); R2 (Y); R3 (Z); W1 (X);
    W2 (Y); W3 (Z);
  • 38. Serializability
    Is G serializable?
    Equivalent to the serial schedule <T1,T2>
    But not <T2,T1>
    G is conflict-serializable
    Conflict equivalence: The schedules S1 and S2are conflict-equivalent if the following conditions are satisfied:
    Both schedules S1and S2involve the same set of transactions (including ordering of actions within each transaction).
    The order of each pair of conflicting actions in S1 and S2are the same
    Conflict-serializability: A schedule is conflict-serializablewhen the schedule is conflict-equivalent to one or more serial schedules.
  • 39. Serializability of Schedule G
    T1: R(A) W(B)
    T2: R(A) W(A)
    Precedence graph:
    a node for each transaction
    an arc from Ti to Tj if an action in Tiprecedes and conflicts with an action in Tj.
    T1 T2? R1 (A) W1 (B) R2(A) W2(A) ? No conflicts
    T2T1? R2(A) W2 (A) R1 (A) W1 (B) ?
    Two actions conflict if
    The actions belong to different transactions.
    At least one of the actions is a write operation.
    The actions access the same object (read or write).
    Theorem: A schedule is conflict serializable if and only if its precedence graph is acyclic
  • 40. Enforcing Serializable Schedules
    Prevent cycles in the Precedence Graph, P(S), from occurring
    Locking primitives:
    Lock (exclusive): li(A)
    Unlock: ui(A)
    Make transactions consistent
    Ti: pi (A) becomes Ti: li(A) pi (A) ui(A)
    pi (A) is either a READ or a WRITE
    Allow only one transaction to hold a lock on A at any time
    Two-phase locking for transactions
    Ti: li(A) … pi (A) … ui(A)
    no unlocks no locks
  • 41. Legal Schedules?
    S1= l1 (A) l1(B) r1 (A) w1 (B)l2(B)u1 (A) u1 (B)
    r2 (B) w2 (B) u2 (B)l3 (B) r3 (B) u3(B)
    S2= l1 (A) r1 (A) w1 (B) u1 (A) u1 (B)
    l2 (B) r2 (B) w2 (B) l3 (B) r3 (B) u3 (B)
    S3= l1 (A) r1 (A) u1 (A) l1 (B) w1 (B) u1 (B)
    l2 (B) r2 (B) w2 (B) u2 (B)l3 (B) r3 (B) u3 (B)
  • 42. Locking Protocols for Serializable Schedules
    Strict Two-phase Locking (Strict 2PL) Protocol:
    Each transaction must obtain a S (shared) lock on object before reading, and an X (exclusive) lock on object before writing.
    All locks held by a transaction are released when the transaction completes
    Strict 2PL allows only serializable schedules
    Additionally, it simplifies transaction aborts
    (Non-strict) 2PL Variant: Release locks anytime, but cannot acquire locks after releasing any lock.
    If a txn holds an X lock on an object, no other txn can get a lock (S or X) on that object.
    (Non-strict) 2PL also allows only serializable schedules, but involves more complex abort processing
    Why is “acquiring after releasing” disallowed? To avoid cascading aborts
    More in a minute
  • 43. Executing Locking Protocols
    Begin with a Serialized Schedule
    We know it won’t deadlock
    How do we know this?
    Beyond this simple 2PL protocol, it is all a matter of improving performance and allowing more concurrency….
    Shared locks
    Increment locks
    Multiple granularity
    Other types of concurrency control mechanisms
  • 44. Lock Management
    Lock and unlock requests are handled by the lock manager
    Lock Table Entry
    Number of transactions currently holding a lock
    Type of lock held (shared or exclusive)
    Pointer to queue of lock requests
    Locking and unlocking operations
    Support upgrade: transaction that holds a shared lock can be upgraded to hold an exclusive lock
    Any level of granularity can be locked
    Database, table, block, tuple
    Why is this necessary?
  • 45. Multiple-Granularity Locks
    If a transaction needs to scan all records in a table, we don’t really want to have a lock on all tuples individually – significant locking overhead!
    Put a single lock on the table
    A lock on a node
    implicitly locks
    all descendents.
  • 46. Aborting a Transaction
    If a transaction Ti is aborted, all its actions have to be undone.
    If Tjreads an object last written by Ti, Tj must be aborted as well!
    Most systems avoid such cascading aborts by releasing a transaction’s locks only at commit time.
    If Tiwrites an object, Tjcan read this only after Ticommits.
    In order to undo the actions of an aborted transaction, the DBMS maintains a log in which every write is recorded.
    The same mechanism is used to recover from system crashes; all active txns at the time of the crash are aborted when the system recovers
  • 47. Performance Considerations (Again!)
    2PL Protocol allows transactions to proceed with maximum parallelism
    Locking algorithm only delays actions that would cause conflicts
    But the locks are still a bottleneck
    Need to ensure lowest-possible level of locking granularity
    Classic memory-performance trade-off
    Conflict-serialization is too conservative
    But other methods of serialization are too complex
    A use case that occurs quite often, should be optimized
    Besides scanning through the table, if we need to modify a few tuples, what kind of lock to put on the table?
    Have to be X (if we only have S or X).
    But, blocks all other read requests!
    Concurrency control is pessimistic and acquires/releases locks
    Optimistic Concurrency Control
  • 48. Next Week
    Intention Locks
    Optimistic Concurrency Control
    Distributed Commit
    Please Read ahead of time
    The end of an Architectural Era, Stonebraker et al, Proc. VLDB, 2007
    OLTP Through the Looking Glass, and What We Found There, Harizopoulos et al, Proc ACM SIGMOD, 2008