• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Zab dsn-2011
 

Zab dsn-2011

on

  • 5,066 views

Talk given at the DSN conference.

Talk given at the DSN conference.

Statistics

Views

Total Views
5,066
Views on SlideShare
4,998
Embed Views
68

Actions

Likes
21
Downloads
0
Comments
0

6 Embeds 68

https://twitter.com 57
http://paper.li 7
http://us-w1.rockmelt.com 1
http://twitter.com 1
http://a0.twimg.com 1
https://si0.twimg.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Zab dsn-2011 Zab dsn-2011 Presentation Transcript

    • Zab: High-performance broadcast for primary-backup systems Flavio Junqueira, Benjamin Reed, Marco Serafini Yahoo! Research June 2011
    • Setting up the stage• Background: ZooKeeper• Coordination service ! Web-scale applications ! Intensive use (high performance) ! Source of truth for many applications June 2011 2
    • ZooKeeper• Open source Apache project• Used in production ! Yahoo! ! Facebook ! Rackspace ! ... http://zookeeper.apache.org June 2011 3
    • ZooKeeper• ... is a leader-based, replicated service ! Processes crash and recover• Leader ! Executes requests Leader Follower Follower ! Propagates state updates Broadcast Deliver Deliver• Follower Atomic broadcast ! Applies state updates June 2011 4
    • ZooKeeper• Client Client ! Submits operations to a server Request ! If follower, forwards to Leader Follower Follower leader Broadcast Deliver Deliver ! Leader executes and propagates state update Atomic broadcast June 2011 5
    • ZooKeeper• State updates ! All followers apply the same updates ! All followers apply them in the same order ! Atomic broadcast• Performance requirements ! Multiple outstanding operations ! Low latency and high throughput June 2011 6
    • ZooKeeper• Update configuration and create ready• If ready exists, then configuration isconsistent setData del setData /cfg/client /cfg/ready /cfg/server create B B Follower /cfg/ready Leader create /cfg/ready setData Follower /cfg/server setData B /cfg/client del B /cfg/ready • If 1 doesn’t commit, then 2+3 can’t • If 2+3 don’t commit, then 4 must not commit commit June 2011 7
    • ZooKeeper• Exploring Paxos ! Efficient consensus protocol ! State-machine replication ! Multiple consecutive instances• Why is it not suitable out of the box? ! Does not guarantee order ! Multiple outstanding operations June 2011 8
    • Paxos at a glance 1b: Acceptor promises 2b: If quorum, value not to accept lower is chosen ballotsAcceptor + Learner 1a 1b 2a 2b 3a Acceptor +Proposer + Learner 1a 1b 2a 2b 3aAcceptor + Learner Phase 1: Phase 2: Phase 3: Selects Proposes Value value to a value learned propose June 2011 9
    • Paxos run Interleaves operations of P1, 27: <1a,3> 27: <2a, 3, C> P2, and and P3 28: <1a,3> 28: <2a, 3, B> 29: <1a,3> 29: <2a, 3, D>P3 Has accepted A and B from P1A1 27: <1, A> 27: <1b, 1, A> 28: <1, B> 28: <1b, 1, B> 29: <1b, _, _>A2 Has 27: <3, C> 27: <2, C> accepted C 28: <3, B> from P2 29: <3, D>A3 27: <2, C> 27: <1b, 2, C> 27: <3, C> 28: <1b, _, _> 28: <3, B> 29: <1b, _, _> 29: <3, D> June 2011 10
    • ZooKeeper• Another requirement ! Minimize downtime ! Efficient recovery• Reduce the amount of state transfered• Zab ! One identifier ! Missing values for each process June 2011 11
    • Zab and PO Broadcast
    • Definitions• Processes: Lead or Follow• Followers ! Maintain a history of transactions (updates)• Transaction identifiers: !e,c" ! e : epoch number of the leader ! c : epoch counter June 2011 13
    • Properties of PO Broadcast• Integrity ! Only broadcast transactions are delivered ! Leader recovers before broadcasting new transactions• Total order and agreement ! Followers deliver the same transactions and in the same order June 2011 14
    • Primary order• Local: Transactions of a leader accepted in order• Global: Transactions in history respect the order of epochs June 2011 15
    • Primary order• Local: Transactions of a primary accepted in order• Global: Transactions in history respect the order of epochs abcast(!e,10") abcast(!e,11") abcast(!e,12") LeaderFollower June 2011 16
    • Primary order• Local: Transactions of a primary accepted in order• Global: Transactions in history respect the order of epochs abcast(!e,10") abcast(!e,11") abcast(!e,12") LeaderFollower June 2011 17
    • Primary order• Local: Transactions of a primary accepted in order• Global: Transactions in history respect the order of epochs abcast(!e,10") abcast(!e,11") Leader abcast(!e’,1") Leader’ Follower June 2011 18
    • Primary order• Local: Transactions of a primary accepted in order• Global: Transactions in history respect the order of epochs abcast(!e,10") abcast(!e,11") Leader abcast(!e’,1") Leader’Follower June 2011 19
    • Zab in Phases• Phase 0 - Leader election ! Prospective leader elected• Phase 1- Discovery ! Followers promise not to go back to previous epochs ! Followers send to their last epoch and history ! selects longest history of latest epoch June 2011 20
    • Zab in Phases• Phase 2 - Synchronization ! sends new history to followers ! Followers confirm leadership• Phase 3 - Broadcast ! proposes new transactions ! commits if quorum acknowledges June 2011 21
    • Zab in Phases• Phases 1 and 2: Recovery ! Critical to guarantee order with multiple outstanding transactions• Phase 3: Broadcast ! Just like Phases 2 and 3 of Paxos June 2011 22
    • Zab: Sample run f1 f2 f3 !0,1" !0,1" !0,1" !0,2" !0,2" !0,3"New epoch f1.a = 0, f2.a = 0, f3.a = 0, !0,3" !0,2" !0,1" Initial history of new epoch June 2011 23
    • Zab: Sample run f1 f2 f3 !0,1" !0,1" !0,1" !0,2" !0,2" !0,2" Chosen! !1,1" !1,1" !1,2"New epoch f1.a = 1, f2.a = 1, f3.a = 2, !1,2" !1,1" !0,2" Can’t happen! June 2011 24
    • Paxos run (revisited) Epoch 1, Phase 3 Epoch 2, Phase 3 Epoch 3, Phase 3 L1 History: # Phases 1 L2 History: # Phases 1 L3 History: !2,1",C and 2 and 2 of Epoch 2 of Epoch 3Follower 1 Epoch: 1 Epoch: 1 Epoch: 3 !1,1",A !1,1",A !2,1",C !1,2",B !1,2",B !3,1",DFollower 2 Epoch: 1 Epoch: 2 Epoch: 2 # !2,1",C !2,1",CFollower 3 Epoch: 3 Epoch: 1 Epoch: 2 # !2,1",C !2,1",C !3,1",D June 2011 25
    • Notes on implementation• Use of TCP ! Ordered delivery, retransmissions, etc. ! Notion of session• Elect leader with most committed txns ! No follower ! leader copies• Recovery ! Last zxid is sufficient ! In Phase 2, leader commands to add or truncate June 2011 26
    • Performance
    • Experimental setup• Implementation in Java• 13 identical servers ! Xeon 2.50GHz, Gigabit interface, two SATA disks http://zookeeper.apache.org June 2011 28
    • Throughput Continuous saturated throughput 70000 Net only Net + Disk 60000 Net + Disk (no write cache) Net cap 50000Operations per second 40000 30000 20000 10000 0 2 4 6 8 10 12 14 Number of servers in ensemble June 2011 29
    • Latency June 2011 30
    • Wrap up
    • Conclusion• Zookeeper ! Multiple outstanding operations ! Dependencies between consecutive updates• Zab ! Primary Order Broadcast ! Synchronization phase ! Efficient recovery June 2011 32
    • Questions?http://zookeeper.apache.org