Your SlideShare is downloading. ×
CS 542 -- Concurrency Control, Distributed Commit
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

CS 542 -- Concurrency Control, Distributed Commit

2,085
views

Published on

Published in: Technology, Business

0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,085
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
88
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. CS 542 Database Management Systems
    Concurrency Control
    Commit in Distributed Systems
    J Singh
    April 11, 2011
  • 2. Today’s Meeting
    Concurrency Control
    Intention Locks
    Index Locking
    Optimistic CC
    Validation
    Timestamp Ordering
    Multi-version CC
    Commit in Distributed Databases
    Two Phase Commit
    Paxos Algorithm
    Concluding thoughts
    References (aside from textbook):
    Concurrency Control and Recovery in Database Systems, Philip A. Bernstein, VassosHadzilacos, Nathan Goodman, Microsoft Research.
    Concurrency Control: Methods, Performance, and Analysis, Alexander Thomasian, ACM Computing Surveys, March, 1998
    Paxos Commit, Gray & Lamport, Microsoft Research TechFest, 2004
    OLTP Through the Looking Glass, and What We Found There, Harizopoulos et al, Proc ACM SIGMOD, 2008
    The end of an Architectural Era, Stonebraker et al, Proc. VLDB, 2007
  • 3. Motivation for intention locks
    Besides scanning through the table, if we need to modify a few tuples. What kind of lock to put on the table?
    Have to be X (if we only have S or X).
    But, blocks all other read requests!
  • 4. Intention Locks
    Allow intention locks IS, IX.
    Before S locking an item, must IS lock the root.
    Before X locking an item, must IX lock the root.
    Should make sure:
    If Ti S locks a node, no Tj can X lock an ancestor.
    Achieved if S conflicts with IX
    If TjX locks a node, no Tican S or X lock an ancestor.
    Achieved if X conflicts with IS and IX.
  • 5. Allowed Lock Sharings
    Lock Requester
    IX
    S
    SIX
    X
    IS
    Ö
    Ö
    Ö
    Ö
    Ö
    IS
    IX
    Ö
    Ö
    Lock Holder
    S
    Ö
    Ö
    SIX
    Ö
    X
  • 6. Multiple Granularity Lock Protocol
    Each txn starts from the root of the hierarchy.
    To get a lock on any node, must hold an intentional lock on its parent node!
    E.g. to get S lock on a node, must hold IS or IX on parent.
    E.g. to get X lock on a node, must hold IX or SIX on parent.
    Full table of rules:
    Must release locks in bottom-up order.
  • 7. Example 1
    T1(IS)
    T1(S)
    T1 needs a shared lock on t2
    T2 needs a shared lock on R1
    , T2(S)
    R1
    t1
    t4
    t2
    t3
  • 8. Example 2
    T1(IS)
    , T2(IX)
    T2(IX)
    T1(S)
    • T1 needs a shared lock on t2
    T2 needs an exclusive lock on t4
    No conflict
    R1
    t1
    t4
    t2
    t3
  • 9. Examples 3, 4, 5
    T1 scans R, and updates a few tuples:
    T1 gets an SIX lock on R, and occasionally upgrades to X on the tuples.
    T2 uses an index to read only part of R:
    T2 gets an IS lock on R, and repeatedly gets an S lock on tuples of R.
    T3 reads all of R:
    T3 gets an S lock on R.
    OR, T3 could behave like T2; can use lock escalationas it goes.
    Lock Requester
    IX
    S
    SIX
    X
    IS
    Ö
    Ö
    Ö
    Ö
    Ö
    IS
    IX
    Ö
    Ö
    Lock Holder
    S
    Ö
    Ö
    SIX
    Ö
    X
  • 10. Insert and Delete
    Transactions
    T1:
    SELECT MAX(Price) WHERE Rating = 1;
    SELECT MAX(Price) WHERE Rating = 2;
    T2:
    INSERT <Apple, Arkansas Black, 1, 96>;
    DELETE WHERE Rating = 2
    AND Price = (SELECT MAX(Price) WHERE Rating = 2);
    Execution
    T1 locks all records w/Rating=1 and gets 80.
    T2 inserts <Arkansas Black, 96>
    T2 deletes <Fuji, 75>
    T1 locks all records w/Rating=2 and gets 65.
  • Insert and Delete Rules
    When T1 inserts t1 into R,
    Give X lock on t1 to T1
    When T2 deletes t2 from R,
    It must obtain an X lock on t2
    This will fix the Fuji delete problem (how so?)
    But there is still a problem: Phantom Reads.
    Seen with Arkansas Black in the example
    Solution: use multiple granularity tree
    Before inserting Q, obtain an X lock for parent(Q)
  • 15. Today’s Meeting
    Concurrency Control
    Intention Locks
    Index Locking
    Optimistic CC
    Validation
    Timestamp Ordering
    Multi-version CC
    Commit in Distributed Databases
    Two Phase Commit
    Paxos Algorithm
    Concluding thoughts
    References (aside from textbook):
    Concurrency Control and Recovery in Database Systems, Philip A. Bernstein, VassosHadzilacos, Nathan Goodman, Microsoft Research.
    Concurrency Control: Methods, Performance, and Analysis, Alexander Thomasian, ACM Computing Surveys, March, 1998
    Paxos Commit, Gray & Lamport, Microsoft Research TechFest, 2004
    OLTP Through the Looking Glass, and What We Found There, Harizopoulos et al, Proc ACM SIGMOD, 2008
    The end of an Architectural Era, Stonebraker et al, Proc. VLDB, 2007
  • 16. Did Insert/Delete expose a flaw in 2PL?
    The flaw was with the assumption that by locking all tuples, T1 had locked the set!
    We needed to lock the set
    Would we bottleneck on the relation if the workload were insert- and delete-heavy?
    There is another way to solve the problem:
    Lock at the index (if one exists)
    Since B+ trees are not 100% full, we can maintain multiple locks in different sections of the tree.
    Index
    Put a lock here.
    r=1
  • 17. Index Locking (p1)
    Higher levels of the tree only direct searches for leaf pages.
    For inserts, a node on a path from root to modified leaf must be locked (in X mode, of course), only if a split can propagate up to it from the modified leaf. (Similar point holds w.r.t. deletes.)
    We can exploit these observations to design efficient locking protocols that guarantee serializability even though they violate 2PL.
  • 18. Index Locking (p2)
    Search: Start at root and go down; repeatedly, S lock child then unlock parent.
    Insert/Delete: Start at root and go down, obtaining X locks as needed. Once child is locked, check if it is safe:
    If child is safe, release all locks on ancestors.
    Safe node: Node such that changes will not propagate up beyond this node.
    Inserts: Node is not full.
    Deletes: Node is not half-empty.
  • 19. Example
    ROOT
    Where to lock?
    1) Delete 38*
    2) Insert 45*
    3) Insert 25*
    A
    20
    B
    35
    C
    F
    38
    44
    23
    H
    D
    E
    G
    I
    20*
    22*
    23*
    24*
    35*
    36*
    38*
    41*
    44*
  • 20. Today’s Meeting
    Concurrency Control
    Intention Locks
    Index Locking
    Optimistic CC
    Validation
    Timestamp Ordering
    Multi-version CC
    Commit in Distributed Databases
    Two Phase Commit
    Paxos Algorithm
    Concluding thoughts
    References (aside from textbook):
    Concurrency Control and Recovery in Database Systems, Philip A. Bernstein, VassosHadzilacos, Nathan Goodman, Microsoft Research.
    Concurrency Control: Methods, Performance, and Analysis, Alexander Thomasian, ACM Computing Surveys, March, 1998
    Paxos Commit, Gray & Lamport, Microsoft Research TechFest, 2004
    OLTP Through the Looking Glass, and What We Found There, Harizopoulos et al, Proc ACM SIGMOD, 2008
    The end of an Architectural Era, Stonebraker et al, Proc. VLDB, 2007
  • 21. Optimistic CC
    Locking is a conservative approach in which conflicts are prevented. Disadvantages:
    Lock management overhead.
    Deadlock detection/resolution.
    Not discussed in CS-542 lectures, expecting that you are familiar with it
    If conflicts are rare, we may be able to gain performance by not locking, and instead checking for conflicts before txns commit.
    Two approaches
    Kung-Robinson Model
    Divides every transaction into three phases: read, validate, write
    Makes commit/abort decision based on what’s being read and written
    Timestamp Ordering Algorithms
    Clever use of timestamps to determine which operations are conflict-free and which must be aborted
  • 22. Kung-Robinson Model
    Key idea:
    Let transactions work in isolation
    Validate reads and writes when ready to commit
    Make Validation Atomic
    Validated ≡ Committed
    Transactions have three phases:
    READ:
    txns read from the database,
    make changes to private copies of objects.
    VALIDATE:
    Check if schedule so far is serializable.
    WRITE:
    Make local copies of changes public.
    old
    ROOT
    modified
    objects
    new
  • 23. Validation
    Test conditions that are sufficient to ensure that no conflict occurred.
    Each txn is assigned a numeric id.
    Just use a timestamp.
    Transaction ids assigned at end of READ phase, just before validation begins.
    ReadSet(Ti): Set of objects read by txn Ti.
    WriteSet(Ti): Set of objects modified by Ti.
    Validation is atomic
    Done in a critical section
  • 24. Validation Tests
    Test
    FIN(Ti) < START(Tj)
    FIN(Ti) < VAL(Tj) AND
    WriteSet(Ti ) ∩ReadSet(Tj ) is empty.
    VAL(Ti) < VAL(Tj) AND
    WriteSet(Ti ) ∩ReadSet(Tj ) is empty AND
    WriteSet(Ti ) ∩WriteSet(Tj ) is empty.
    Ti
    Tj
    Ti
    Ti
    R
    V
    W
    R
    V
    W
    R
    V
    W
    Tj
    R
    V
    W
    Tj
    R
    V
    W
    R
    V
    W
    Situation
  • 25. Overheads in Kung-Robinson CC
    Must record read/write activity in ReadSet and WriteSet per txn.
    Must create and destroy these sets as needed.
    Must check for conflicts during validation, and must make validated writes “global”.
    Critical section can reduce concurrency.
    Scheme for making writes global can reduce clustering of objects.
    Optimistic CC restarts transactions that fail validation.
    Work done so far is wasted; requires clean-up.
  • 26. Today’s Meeting
    Concurrency Control
    Intention Locks
    Index Locking
    Optimistic CC
    • Validation
    Timestamp Ordering
    Multi-version CC
    Commit in Distributed Databases
    Two Phase Commit
    Paxos Algorithm
    Concluding thoughts
    References (aside from textbook):
    Concurrency Control and Recovery in Database Systems, Philip A. Bernstein, VassosHadzilacos, Nathan Goodman, Microsoft Research.
    Concurrency Control: Methods, Performance, and Analysis, Alexander Thomasian, ACM Computing Surveys, March, 1998
    Paxos Commit, Gray & Lamport, Microsoft Research TechFest, 2004
    OLTP Through the Looking Glass, and What We Found There, Harizopoulos et al, Proc ACM SIGMOD, 2008
    The end of an Architectural Era, Stonebraker et al, Proc. VLDB, 2007
  • 27. Timestamp Ordering CC
    Main idea:
    Put a timestamp on the last read and write action on every object
    Use this timestamp to detect if a transaction attempts an illegal operation
    Abort the offending transaction if it does
    Algorithm:
    Give each object a read-timestamp (RTS) and a write-timestamp (WTS),
    Give each txn a timestamp (TS) when it begins
    Action ai of txn Ti must occur before action aj of txn Tj if
    If action ai of txn Ti conflicts with action aj of txn Tj, and
    TS(Ti) < TS(Tj), then ai must occur before aj.
    Otherwise, restart the violating txn.
  • 28. Rules for Timestamps-Based scheduling
    Algorithm setup
    RT(X)
    The read time of X, the highest timestamp of transaction that has read X.
    WT(X)
    The write time of X, the highest timestamp of transaction that has write X.
    C(X)
    The commit bit for X, which is true if and only if the most recent transaction to write X has already committed.
    Scheduler receives a request from T to operate on X
    The request is realizable under some conditions and not under others
  • 29. Physically Unrealizable
    Read too late
    A transaction U that started after transaction T but wrote a value for X before T reads X
    In other words, if TS(T) < RT(X), then the write is physically unrealizable, and T must be rolled back.
    U writes X
    T reads X
    T start
    U start
  • 30. Physically Unrealizable
    Write too late
    A transaction U that started after T, but read X before T got a chance to write X.
    In other words, if TS(T) < RT(X), then the write is physically unrealizable, and T must be rolled back.
    U reads X
    T writes X
    T start
    U start
  • 31. Dirty Read
    After T reads the value of X written by U, U could abort
    In other words, if TS(T) = RT(X) but TS(T) < WT(X), then the write is physically realizable, but there is already a later value in X.
    If C(X) is true, then the previous writer of X is committed, all is good.
    If C(X) is false, we must delay T.
    U writes X
    T reads X
    U start
    T start
    U aborts
  • 32. Write after Write
    T tries to write X after a later transaction (U) has written it
    OK to ignore the write by T because it will get overwritten anyway
    Except if U aborts
    And the new value of T is lost forever
    Solve this problem by introducing the concept of a “tentative write”
    U writes X
    T writes X
    U abort
    U start
    T start
    T commit
  • 33. Rules for Timestamps-based Scheduling
    Scheduler receives a request to commit T.
    It must find all the database elements X written by T and set C(X)=true.
    If any transactions are waiting for X to be committed, these transactions are allowed to proceed.
    Scheduler receives a request to abort T or decides to rollback T,
    Any transaction that was waiting on an element X that T wrote must repeat its attempt to read or write.
  • 34. Today’s Meeting
    Concurrency Control
    Intention Locks
    Index Locking
    Optimistic CC
    • Validation
    Timestamp Ordering
    Multi-version CC
    Commit in Distributed Databases
    Two Phase Commit
    Paxos Algorithm
    Concluding thoughts
    References (aside from textbook):
    Concurrency Control and Recovery in Database Systems, Philip A. Bernstein, VassosHadzilacos, Nathan Goodman, Microsoft Research.
    Concurrency Control: Methods, Performance, and Analysis, Alexander Thomasian, ACM Computing Surveys, March, 1998
    Paxos Commit, Gray & Lamport, Microsoft Research TechFest, 2004
    OLTP Through the Looking Glass, and What We Found There, Harizopoulos et al, Proc ACM SIGMOD, 2008
    The end of an Architectural Era, Stonebraker et al, Proc. VLDB, 2007
  • 35. Multiversion Timestamps
    Multiversion schemes keep old versions of data item to increase concurrency.
    Each successful write results in the creation of a new version of the data item written.
    Use timestamps to label versions.
    When a read(X) operation is issued, select an appropriate version of X based on the timestamp of the transaction, and return the value of the selected version.
  • 36. Timestamps vs Locking
    Generally, timestamping performs better than locking in situations where:
    Most transactions are read-only.
    It is rare that concurrent transaction will try to read and write the same element.
    This is generally the case for Web Applications
    In high-conflict situation, locking performs better than timestamps
  • 37. Practical Use
    2-Phase Locks (or variants)
    Used by most relational databases
    Multi-level granularity
    Support for table, page and tuple-level locks
    Used by most relational databases
    Multi-version concurrency control
    Oracle 8 forward: Divide transactions into read-only and read-write
    Read-only transactions use multi-version concurrency and never wait
    Read-write transactions use 2PL
    Postgres, others as well, offer some level of MVCC
  • 38. Today’s Meeting
    Concurrency Control
    Intention Locks
    Index Locking
    Optimistic CC
    Validation
    Timestamp Ordering
    Multi-version CC
    Commit in Distributed Databases
    Two Phase Commit
    Paxos Algorithm
    Concluding thoughts
    References (aside from textbook):
    Concurrency Control and Recovery in Database Systems, Philip A. Bernstein, VassosHadzilacos, Nathan Goodman, Microsoft Research.
    Concurrency Control: Methods, Performance, and Analysis, Alexander Thomasian, ACM Computing Surveys, March, 1998
    Paxos Commit, Gray & Lamport, Microsoft Research TechFest, 2004
    OLTP Through the Looking Glass, and What We Found There, Harizopoulos et al, Proc ACM SIGMOD, 2008
    The end of an Architectural Era, Stonebraker et al, Proc. VLDB, 2007
  • 39. Distributed Commit Motivation
    FruitCo has
    Its main Sales office in Oregon
    Farms and Warehouse are in Washington
    Finance is in Utah
    All three sites have local data centers with their own systems
    When an order is placed, the Sales system must send the billing information to Utah and shipping information to Washington.
    When an order is placed, all three databases must be updated, or none should be.
  • 40. Two Phase Commit
    The Basic Idea
  • 41. Two-Phase Commit (2PC)
    Phase 1 : The TM gets the RMs ready to write the results into the database
    Phase 2 : Everybody writes the results into the database
    TM :The process at the site where the transaction originates and which controls the execution
    RM :The process at the other sites that participate in executing the transaction
    Global Commit Rule:
    The TM aborts a transaction if and only if at least one RM votes to abort it.
    The TM commits a transaction if and only if all of the RMs vote to commit it.
  • 42. Centralized 2PC
    P
    P
    P
    P
    C
    C
    C
    P
    P
    P
    P
    ready?
    yes/no
    commit/abort?
    commited/aborted
    Phase 1
    Phase 2
  • 43. State Transitions in 2PC
    INITIAL
    INITIAL
    READY
    Prepare
    Commit command
    Vote-commit
    Prepare
    Prepare
    Vote-abort
    WAIT
    Global-abort
    Global-commit
    Vote-commit (all)
    Vote-abort
    Ack
    Ack
    Global-commit
    Global-abort
    ABORT
    COMMIT
    COMMIT
    ABORT
    TM
    RMs
  • 44. When TM Fails…
    Timeout in INITIAL
    Who cares
    Timeout in WAIT
    Cannot unilaterally commit
    Can unilaterally abort
    Timeout in ABORT or COMMIT
    Stay blocked and wait for the acks
    TM
    INITIAL
    Commit command
    Prepare
    WAIT
    Vote-abort
    Vote-commit
    Global-commit
    Global-abort
    ABORT
    COMMIT
  • 45. When an RM Fails…
    INITIAL
    Timeout in INITIAL
    TM must have failed in INITIAL state
    Unilaterally abort
    Timeout in READY
    Stay blocked
    RMs
    Prepare
    Vote-commit
    Prepare
    Vote-abort
    READY
    Global-abort
    Global-commit
    Ack
    Ack
    ABORT
    COMMIT
  • 46. When TM Recovers…
    Failure in INITIAL
    Start the commit process upon recovery
    Failure in WAIT
    Restart the commit process upon recovery
    Failure in ABORT or COMMIT
    Nothing special if all the acks have been received
    Otherwise the termination protocol is involved
    INITIAL
    TM
    Commit command
    Prepare
    WAIT
    Vote-commit
    Vote-abort
    Global-commit
    Global-abort
    ABORT
    COMMIT
  • 47. When an RM Recovers…
    Failure in INITIAL
    Unilaterally abort upon recovery
    Failure in READY
    The TM has been informed about the local decision
    Treat as timeout in READY state and invoke the termination protocol
    Failure in ABORT or COMMIT
    Nothing special needs to be done
    INITIAL
    RMs
    Prepare
    Vote-commit
    Prepare
    Vote-abort
    READY
    Global-abort
    Global-commit
    Ack
    Ack
    COMMIT
    ABORT
  • 48. 2PC Protocol Actions
    RM
    TM
    INITIAL
    INITIAL
    PREPARE
    write
    begin_commit
    in log
    write abort
    in log
    No
    Ready to
    Commit?
    VOTE-ABORT
    Yes
    VOTE-COMMIT
    write ready
    in log
    WAIT
    Yes
    GLOBAL-ABORT
    write abort
    in log
    READY
    Any No?
    No
    VOTE-COMMIT
    write commit
    in log
    Abort
    Type of
    msg
    ACK
    Commit
    write abort
    in log
    ABORT
    COMMIT
    ACK
    write commit
    in log
    write
    end_of_transaction
    in log
    ABORT
    COMMIT
  • 49. Two-phase commit commentary
    Two-phase commit protocol limitation: it is a blocking protocol.
    The failure of the TM can cause the protocol to block until the TM is repaired.
    If the TM fails right after every RM has sent a Prepared message, then the other RMs have no way of knowing whether the TM committed or aborted.
    RMs will block resource processes while waiting for a message from the TM.
    A TM will also block resources while waiting for replies from RMs. A TM can also block indefinitely if no acknowledgement is received from the RM.
    “Federated” two-phase commit protocols, aka three-phase protocols, have been proposed but are still unproven.
    Paxos Consensus Algorithm.
    Consensus on Transaction Commit, Jim Gray and Leslie Lamport, Microsoft Research, 2005, MSR-TR-2003-96
  • 50. Today’s Meeting
    Concurrency Control
    Intention Locks
    Index Locking
    Optimistic CC
    Validation
    Timestamp Ordering
    Multi-version CC
    Commit in Distributed Databases
    Two Phase Commit
    Paxos Algorithm
    Concluding thoughts
    References (aside from textbook):
    Concurrency Control and Recovery in Database Systems, Philip A. Bernstein, VassosHadzilacos, Nathan Goodman, Microsoft Research.
    Concurrency Control: Methods, Performance, and Analysis, Alexander Thomasian, ACM Computing Surveys, March, 1998
    Paxos Commit, Gray & Lamport, Microsoft Research TechFest, 2004
    OLTP Through the Looking Glass, and What We Found There, Harizopoulos et al, Proc ACM SIGMOD, 2008
    The end of an Architectural Era, Stonebraker et al, Proc. VLDB, 2007
  • 51. Fault-Tolerant Two Phase Commit
    Prepared
    client
    TM
    RM
    RequestCommit
    Prepare
    Prepared
    Prepare
    TM
    RM
    RequestCommit
    Prepare
    Prepared
    If the 2PC Transaction Manager (TM) Fails, transaction blocks.
    Solution: Add a “spare” transaction manager (non blocking commit, 3 phase commit)
  • 52. Fault-Tolerant Two Phase Commit
    client
    TM
    RM
    abort
    Prepared
    Prepare
    commit
    commit
    TM
    RM
    TM
    Prepared
    commit
    Prepare
    RequestCommit
    Prepare
    Prepared
    Inconsistent!
    Now What?
    Prepare
    Prepared
    commit
    commit
    abort
    If the 2PC Transaction Manager (TM) Fails, transaction blocks.
    Solution: Add a “spare” transaction manager (non blocking commit, 3 phase commit)
    The complexity is a mess.
    But… What if….?
  • 53. Fault Tolerant 2PC
    Several workarounds proposed in database community:
    Often called "3-phase" or "non-blocking" commit.
    None with complete algorithm and correctness proof.
  • 54. Propose X
    consensus
    box
    client
    W Chosen
    Propose W
    client
    W Chosen
    client
    W Chosen
    Consensus
    collects proposed values
    Picks one proposed value
    remembers it forever
  • 55. Consensus for Commit – The Obvious Approach
    consensus
    box
    RM
    client
    TM
    Propose Prepared
    Prepared Chosen
    Request Commit
    Prepared
    Prepare
    Commit
    Commit
    Prepare
    Commit
    TM
    RM
    Prepared Chosen
    Prepared
    RequestCommit
    Prepare
    Prepared
    Propose Prepared
    Prepared Chosen
    Commit
    Commit
    Get consensus on TM’s decision.
    TM just learns consensus value.
    TM is “stateless”
  • 56. Consensus for Commit – The Paxos Commit Approach
    RM
    client
    TM
    Request Commit
    consensus
    box
    Propose RM1 Prepared
    Prepare
    RM1 Prepared Chosen
    Commit
    Commit
    Prepare
    consensus
    box
    Commit
    RM
    TM
    Propose RM2 Prepared
    RM2 Prepared Chosen
    RequestCommit
    Prepare
    Propose RM1 Prepared
    Propose RM2 Prepared
    RM1 Prepared Chosen
    RM2 Prepared Chosen
    Commit
    Commit
    Get consensus on each RM’s choice.
    TM just combines consensus values.
    TM is “stateless”
  • 57. The Obvious Approach
    Paxos Commit
    One fewer message delay
    Prepare
    Prepare
    Prepared
    Propose RM1 Prepared
    Propose RM2 Prepared
    Propose Prepared
    RM1 Prepared Chosen
    Prepared Chosen
    RM2 Prepared Chosen
    Commit
    Commit
  • 58. RM
    Consensus box
    Propose RM Prepared
    acceptor
    TM
    acceptor
    TM
    acceptor
    Consensus in Action
    Propose RM Prepared
    Vote RM Prepared
    Propose RM Prepared
    RM Prepared
    Chosen
    Vote RM Prepared
    Vote RM Prepared
    The normal (failure-free) case
    Two message delays
    Can optimize
  • 59. RM
    Consensus box
    acceptor
    TM
    acceptor
    TM
    TM
    acceptor
    Consensus in Action
    TM can always learn what was chosen,
    or get Aborted chosen if nothing chosen yet;
    if majority of acceptors working .
  • 60. The Complete Algorithm
    Subtle.
    More weird cases than most people imagine.
    Proved correct.
  • 61. PaxosCommit in a Nutshell
    Acceptors
    0…2F
    Client
    TM
    RM1…N
    request
    commit
    prepare
    prepared
    all prepared
    commit
    N RMs
    2F+1 acceptors (~2F+1 TMs)
    If F+1 acceptors see all RMs prepared, then transaction committed.
    2F(N+1) + 3N + 1 messages5 message delays 2 stable write delays.
  • 62. Paxos Commit Evaluation
    Two-Phase Commit
    3N+1 messages
    N+1 stable writes
    4 message delays
    2 stable-write delays
    Availability is compromised
    Paxos Commit
    3N+ 2F(N+1) +1 messages
    N+2F+1 stable writes
    5 message delays
    2 stable-write delays
    Tolerates F Faults
    Paxos≣ 2PC for F = 0
    • Paxos Algorithm is the basis of Google’s Global Distributed Lock Manager
    Chubby has F=2 (5 Acceptors)
  • 63. Today’s Meeting
    Concurrency Control
    Intention Locks
    Index Locking
    Optimistic CC
    Validation
    Timestamp Ordering
    Multi-version CC
    Commit in Distributed Databases
    Two Phase Commit
    Paxos Algorithm
    Concluding thoughts
    References (aside from textbook):
    Concurrency Control and Recovery in Database Systems, Philip A. Bernstein, VassosHadzilacos, Nathan Goodman, Microsoft Research.
    Concurrency Control: Methods, Performance, and Analysis, Alexander Thomasian, ACM Computing Surveys, March, 1998
    Paxos Commit, Gray & Lamport, Microsoft Research TechFest, 2004
    OLTP Through the Looking Glass, and What We Found There, Harizopoulos et al, Proc ACM SIGMOD, 2008
    The end of an Architectural Era, Stonebraker et al, Proc. VLDB, 2007
  • 64. OLTP Through the Looking Glass (p1)
    Workload
    TPC-C Benchmark
    Quote:
    Overall, we identify overheads and optimizations that explain a total difference of about a factor of 20x in raw performance. …
    Substantial time is spent in logging, latching, locking, Btree, and buffer management.
    • OLTP Through the Looking Glass, and What We Found There, Harizopoulos et al, Proc ACM SIGMOD, 2008
    Took out components of a DBMS and measured its performance impact
  • 65. OLTP Through the Looking Glass (p2)
    Concurrency Control
    Look for applications where it can be turned off
    Some sort of optimistic concurrency control
    Multi-core Support
    Latching (inter-thread communication) remains a significant bottleneck
    Cache-conscious B-Trees
    Replication Management
    Loss of transactional consistency if log shipping
    Recovery is not instantaneous
    Maintaining transactional consistency
    Weak Consistency
    Starbucks doesn’t need two phase commit
    How to achieve eventual consistency without transactional consistency
    Areas for Research that may yield dividends
  • 66. End of an Era?
    The Relational Model is not necessarily the answer
    It was excellent for data processing
    Not a natural fit for
    Data Warehouses
    Web-oriented search
    Real-time analytics, and
    Semi-structured data
    i.e., Semantic Web
    SQL is not the answer
    Coupling between modern programming languages and SQL are “ugly beyond belief”
    Programming languages have evolved while SQL has remained static
    Pascal
    C/C++
    Java
    The little languages: Python, Perl, PHP, Ruby
    • The end of an Architectural Era, Stonebraker et al, Proc. VLDB, 2007
    A critique of the “one size fits all” assumption in DBMS
  • 67. What’s so fun about databases?
    From our January 13 Lecture…
    Traditional database courses talked about
    Employee records
    Bank records
    Now we talk about
    Web search
    Data mining
    The collective intelligence of tweets
    Scientific and medical databases
    From a personal viewpoint,
    I have enjoyed learning this material with you
    Thank you.
  • 68. About CS 542
    CS 542 will
    Build on database concepts you already know
    Provide you tools for separating hype from reality
    Help you develop skills in evaluating the tradeoffs involved in using and/or creating a database
    CS 542 may
    Train you to read technical journals and apply them
    CS 542 will not
    Cover the intricacies of SQL programming
    Spend much effort in
    Dynamic SQL
    Stored Procedures
    Interfaces with application programming languages
    Connectors, e.g., JDBC, ODBC
    From our January 13 Lecture…
  • 69. Thanks
    Contact Information:
    President, Early Stage IT – a cloud-based consulting firm
    Email: J [dot] Singh [at] EarlyStageIT [dot] com
    Phone: 978-760-2055
    Co-chair of Software and Services SIG at TiE-Boston
    Founder, SQLnix.org, a local resource for NoSQL databases
    My WPI email will be good through the summer.