Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

the Paxos Commit algorithm


Published on

This is the presentation I used to give a seminar about the "Paxos Commit" algorithm. It is one of Leslie Lamport's works (in this case, a joint work between him and Jim Gray). You can find the original paper here:

Feel free to post comments ;)

Published in: Technology, Business
  • Be the first to comment

the Paxos Commit algorithm

  1. 2. Agenda <ul><li>Paxos Commit Algorithm: Overview </li></ul><ul><li>The participating processes </li></ul><ul><ul><li>The resource managers </li></ul></ul><ul><ul><li>The leader </li></ul></ul><ul><ul><li>The acceptors </li></ul></ul><ul><li>Paxos Commit Algorithm: the base version </li></ul><ul><li>Failure scenarios </li></ul><ul><li>Optimizations for Paxos Commit </li></ul><ul><li>Performance </li></ul><ul><li>Paxos Commit vs. Two-Phase Commit </li></ul><ul><li>Using a dynamic set of resource managers </li></ul>
  2. 3. Paxos Commit Algorithm: Overview <ul><li>Paxos was applied to Transaction Commit by L.Lamport and Jim Gray in Consensus on Transaction Commit </li></ul><ul><li>One instance of Paxos (consensus algorithm) is executed for each resource manager, in order to agree upon a value (Prepared/Aborted) proposed by it </li></ul><ul><li>“ Not-synchronous” Commit algorithm </li></ul><ul><li>Fault-tolerant (unlike 2PC) </li></ul><ul><ul><li>Intended to be used in systems where failures are fail-stop only, for both processes and network </li></ul></ul><ul><li>Safety is guaranteed (unlike 3PC) </li></ul><ul><li>Formally specified and checked </li></ul><ul><li>Can be optimized to the theoretically best performance </li></ul>
  3. 4. Participants: the resource managers <ul><li>N resource managers (“RM”) execute the distributed transaction, then choose a value (“locally chosen value” or “LCV”; ‘p’ for prepared iff it is willing to commit) </li></ul><ul><li>Every RM tries to get its LCV accepted by a majority set of acceptors (“MS”: any subset with a cardinality strictly greater than half of the total). </li></ul><ul><li>Each RM is the first proposer in its own instance of Paxos </li></ul>Participants: the leader <ul><li>Coordinates the commit algorithm </li></ul><ul><li>All the instances of Paxos share the same leader </li></ul><ul><li>It is not a single point of failure (unlike 2PC) </li></ul><ul><li>Assumed always defined (true, many leader-(s)election algorithms exist) and unique (not necessarily true, but unlike 3PC safety does not rely on it) </li></ul>
  4. 5. Participants: the acceptors <ul><li>A denotes the set of acceptors </li></ul><ul><li>All the instances of Paxos share the same set A of acceptors </li></ul><ul><li>2F+1 acceptors involved in order to achieve tolerance to F failures </li></ul><ul><li>We will consider only F+1 acceptors, leaving F more for “spare” purposes (less communication overhead) </li></ul><ul><li>Each acceptors keep track of its own progress in a Nx1 vector </li></ul><ul><li>Vectors need to be merged into a Nx|MS| table, called aState, in order to take the global decision (we want “many” p’s) </li></ul>p p p p p AC 4 AC 5 AC 1 AC 2 AC 3 Consensus box (MS) RM1 a Ok! RM2 p Ok! RM3 p Ok! a a a a a p p p p p 3 rd instance 1 st instance 2 nd instance Acc 1 Acc 2 Acc 3 Acc 4 Acc 5 aState Paxos
  5. 6. Paxos Commit (base) Not blocked iff F acceptors respond T 1 T 2 (N=5) (F=2) : Writes on log AC0 L AC1 AC2 RM1 RM2 RM3 RM4 RM0 prepare (N-1) x p2a rm 0 v(rm) (N(F+1)-1) x rm 0 v(rm) rm 0 v(rm) p2b acc rm 0 v(rm) rm 0 v(rm) rm 0 v(rm) Opt. F x If ( Global Commit ) then commit p3 else abort p3 x N 0 0 v(0) p2a 1x BeginCommit
  6. 7. Global Commit Condition <ul><li>That is: there must be one and only one row for each RM involved in the commitment; in each row of those rows there must be at least F+1 entries that have ‘ p ’ as a value and refer to the same ballot </li></ul>p2b acc rm b p
  7. 8. [T 1 ] What if some RMs do not submit their LCV? Leader One majority of acceptors Leader : «Has resource manager j ever proposed you a value?» (Promise not to answer any b L2 <b L1 ) “ accept?” “ promise” “ prepare?” p1a p1b p2a (1) Acceptor i : «Yes, in my last session (ballot) b i with it I accepted its proposal v i » (2) Acceptor i : «No, never» If (at least |MS| acceptors answered) If (for ALL of them case (2) holds) then V=‘a’ [FREE] else V=v(maximum({b i }) [FORCED] Leader : «I am j, I propose V» b L1 >0
  8. 9. [T 2 ] What if the leader fails? <ul><li>If the leader fails, some leader-(s)election algorithm is executed. A faulty election (2+ leaders) doesn’t preclude safety (  3PC), but can impede progress… </li></ul>trusted trusted trusted ignored ignored ignored MS b 1 >0 b 2 > b 1 <ul><li>Non-terminating example: infinite sequence of p1a-p1b-p2a messages from 2 leaders </li></ul><ul><li>Not really likely to happen </li></ul><ul><li>It can be avoided (random T ?) </li></ul>b 3 > b 2 b 4 > b 3 T T T L2 L1 trusted
  9. 10. Optimizations for Paxos Commit (1) <ul><li>Co-Location: each acceptor is on the same node as a RM and the initiating RM is on the same node as the initial leader </li></ul><ul><ul><li>-1 message phase (BeginCommit), -(F+2) messages </li></ul></ul><ul><li>“ Real-Time assumptions”: RMs can prepare spontaneously. The prepare phase is not needed anymore, RMs just “know” they have to prepare in some amount of time </li></ul><ul><ul><li>-1 message phase (Prepare), -(N-1) messages </li></ul></ul>RM3 RM0 AC0 p2a BeginCommit L p3 RM1 AC1 p2a RM4 RM2 AC2 p2a RM0 AC0 L RM3 RM4 RM1 AC1 RM2 AC2 Not needed anymore! prepare (N-1) x
  10. 11. Optimizations for Paxos Commit (2) <ul><li>Phase 3 elimination: the acceptors send their phase2b messages (the columns of aState) directly to the RMs, that evaluate the global commit condition </li></ul><ul><ul><ul><li>Paxos Commit + Phase 3 Elimination = Faster Paxos Commit (FPC) </li></ul></ul></ul><ul><ul><ul><li>FPC + Co-location + R.T.A. = Optimal Consensus Algorithm </li></ul></ul></ul>p2b RM0 AC0 L RM3 RM4 RM1 AC1 RM2 AC2 RM0 AC0 L RM3 RM4 RM1 AC1 RM2 AC2 p2b p3
  11. 12. Performance <ul><li>If we deploy only one acceptor for Paxos Commit (F=0), its fault tolerance and cost are the same as 2PC’s. Are they exactly the same protocol in that case? </li></ul>*Not Assuming RMs’ concurrent preparation (slides-like scenario) **Assuming RMs’ concurrent preparation (r.t. constraints needed) N +F +1 N +F +1 N+1 Stable storage writes** 2 2 2 Stable storage write delays** 2FN-2F +3N-3 2NF +3N-1 NF +3N-3 NF+F +3N-1 3N-3 3N-1 Messages* 3 Coloc. 4 4 5 3 4 Message delays* No coloc. Coloc. No coloc. Coloc. No coloc. Faster Paxos Commit Paxos Commit 2PC
  12. 13. Paxos Commit vs. 2PC <ul><li>Yes, but… </li></ul>T 1 T 2 TM RM1 Other RMs 2PC from Lamport and Gray’s paper 2PC from the slides of the course <ul><li>… two slightly different versions of 2PC! </li></ul>
  13. 14. Using a dynamic set of RM <ul><li>You add one process, the registrar , that acts just like another resource manager, despite the following: </li></ul><ul><ul><li>pad </li></ul></ul><ul><ul><li>Pad </li></ul></ul><ul><li>RMs can join the transaction until the Commit Protocol begins </li></ul><ul><li>The global commit condition now holds on the set of resource managers proposed by the registrar and decided in its own instance of Paxos: </li></ul>AC 4 AC 5 AC 1 AC 2 AC 3 MS RM1 a Ok! RM2 p Ok! RM3 p Ok! Paxos REG RM1;RM2;RM3 Ok! join join join RM1 RM2 RM3 p2b acc rm b p
  14. 15. Thank You! Questions?