SlideShare a Scribd company logo
1 of 33
Zab: High-performance broadcast
  for primary-backup systems
  Flavio Junqueira, Benjamin Reed, Marco Serafini

                Yahoo! Research
                     June 2011
Setting up the stage


•   Background: ZooKeeper
•   Coordination service
    ! Web-scale applications
    ! Intensive use (high performance)
    ! Source of truth for many applications


                        June 2011             2
ZooKeeper

•   Open source Apache project
•   Used in production
    ! Yahoo!
    ! Facebook
    ! Rackspace
    ! ...
                                http://zookeeper.apache.org

                    June 2011                             3
ZooKeeper

•   ... is a leader-based, replicated service
    ! Processes crash and recover

•   Leader
    ! Executes requests
                                          Leader     Follower     Follower
    ! Propagates state updates
                                     Broadcast     Deliver      Deliver

•   Follower
                                                 Atomic broadcast
    ! Applies state updates

                              June 2011                                   4
ZooKeeper

•   Client
                                                    Client
    ! Submits operations to a
      server                                               Request

    ! If follower, forwards to          Leader     Follower      Follower
      leader
                                   Broadcast     Deliver       Deliver
    ! Leader executes and
      propagates state update                  Atomic broadcast


                            June 2011                                    5
ZooKeeper

•   State updates
    ! All followers apply the same updates
    ! All followers apply them in the same order
    ! Atomic broadcast

•   Performance requirements
    ! Multiple outstanding operations
    ! Low latency and high throughput
                         June 2011                 6
ZooKeeper
• Update configuration and create ready
• If ready exists, then configuration is
consistent
                                                    setData        del
                                     setData      /cfg/client   /cfg/ready
                                    /cfg/server
                         create          B
                                                       B
                                                                             Follower
                       /cfg/ready

         Leader
                        create
                      /cfg/ready     setData                                 Follower
                                    /cfg/server     setData
                                         B        /cfg/client      del
                                                       B        /cfg/ready




    • If 1 doesn’t commit, then 2+3 can’t                • If 2+3 don’t commit, then 4 must not
    commit                                               commit
                                             June 2011                                       7
ZooKeeper

•   Exploring Paxos
    ! Efficient consensus protocol
    ! State-machine replication
    ! Multiple consecutive instances

•   Why is it not suitable out of the box?
    ! Does not guarantee order
    ! Multiple outstanding operations

                        June 2011            8
Paxos at a glance
                     1b: Acceptor promises         2b: If quorum, value
                      not to accept lower                 is chosen
                             ballots
Acceptor + Learner


                     1a               1b        2a                  2b 3a
    Acceptor +
Proposer + Learner

                      1a              1b          2a                2b    3a
Acceptor + Learner

                          Phase 1:                       Phase 2:           Phase 3:
                           Selects                       Proposes            Value
                          value to                        a value           learned
                          propose

                                             June 2011                                 9
Paxos run                                           Interleaves
                                                                             operations of P1,
           27: <1a,3>                                    27: <2a, 3, C>      P2, and and P3
           28: <1a,3>                                    28: <2a, 3, B>
           29: <1a,3>                                    29: <2a, 3, D>
P3
                        Has
                   accepted A and
                     B from P1
A1
     27: <1, A>               27: <1b, 1, A>
     28: <1, B>               28: <1b, 1, B>
                              29: <1b, _, _>
A2
                             Has                                          27: <3, C>
     27: <2, C>
                         accepted C                                       28: <3, B>
                           from P2                                        29: <3, D>
A3
     27: <2, C>                         27: <1b, 2, C>          27: <3, C>
                                        28: <1b, _, _>          28: <3, B>
                                        29: <1b, _, _>
                                                                29: <3, D>




                                          June 2011                                              10
ZooKeeper

•   Another requirement
    ! Minimize downtime
    ! Efficient recovery

•   Reduce the amount of state transfered
•   Zab
    ! One identifier
    ! Missing values for each process

                          June 2011         11
Zab and PO Broadcast
Definitions

•   Processes: Lead or Follow
•   Followers
    ! Maintain a history of transactions (updates)

•   Transaction identifiers: !e,c"

    ! e : epoch number of the leader
    ! c : epoch counter

                             June 2011               13
Properties of PO Broadcast


•   Integrity
    ! Only broadcast transactions are delivered
    ! Leader recovers before broadcasting new transactions

•   Total order and agreement
    ! Followers deliver the same transactions and in the
      same order


                             June 2011                       14
Primary order

•   Local: Transactions of a leader accepted in
    order
•   Global: Transactions in history respect the
    order of epochs




                      June 2011                   15
Primary order

•    Local: Transactions of a primary accepted in
     order
•    Global: Transactions in history respect the
     order of epochs
             abcast(!e,10") abcast(!e,11") abcast(!e,12")
    Leader



Follower



                                     June 2011              16
Primary order

•    Local: Transactions of a primary accepted in
     order
•    Global: Transactions in history respect the
     order of epochs
             abcast(!e,10") abcast(!e,11") abcast(!e,12")
    Leader



Follower



                                    June 2011               17
Primary order

•     Local: Transactions of a primary accepted in
      order
•     Global: Transactions in history respect the
      order of epochs
               abcast(!e,10") abcast(!e,11")
    Leader

                                               abcast(!e’,1")
    Leader’


    Follower
                                        June 2011               18
Primary order

•    Local: Transactions of a primary accepted in
     order
•    Global: Transactions in history respect the
     order of epochs
              abcast(!e,10")         abcast(!e,11")
    Leader

                               abcast(!e’,1")
    Leader’


Follower
                                       June 2011      19
Zab in Phases

•   Phase 0 - Leader election
    ! Prospective leader          elected

•   Phase 1- Discovery
    ! Followers promise not to go back to previous
      epochs
    ! Followers send to          their last epoch and history

    !    selects longest history of latest epoch
                           June 2011                            20
Zab in Phases

•   Phase 2 - Synchronization
    !    sends new history to followers

    ! Followers confirm leadership

•   Phase 3 - Broadcast
    !    proposes new transactions

    !    commits if quorum acknowledges

                       June 2011          21
Zab in Phases


•   Phases 1 and 2: Recovery
    ! Critical to guarantee order with multiple
      outstanding transactions

•   Phase 3: Broadcast
    ! Just like Phases 2 and 3 of Paxos



                         June 2011                22
Zab: Sample run

                  f1                  f2       f3

               !0,1"               !0,1"     !0,1"
               !0,2"               !0,2"
               !0,3"
New epoch
             f1.a = 0,          f2.a = 0,   f3.a = 0,
               !0,3"              !0,2"       !0,1"
            Initial history
            of new epoch



                              June 2011                 23
Zab: Sample run

                  f1               f2         f3

                !0,1"          !0,1"        !0,1"
                !0,2"          !0,2"        !0,2"
     Chosen!    !1,1"          !1,1"
                !1,2"
New epoch

               f1.a = 1,      f2.a = 1,    f3.a = 2,
                 !1,2"          !1,1"        !0,2"

                           Can’t happen!


                              June 2011                24
Paxos run (revisited)
       Epoch 1, Phase 3                Epoch 2, Phase 3                  Epoch 3, Phase 3
         L1 History: #     Phases 1     L2 History: #        Phases 1     L3 History: !2,1",C
                             and 2                             and 2
                          of Epoch 2                        of Epoch 3




Follower 1
              Epoch: 1                           Epoch: 1                      Epoch: 3
              !1,1",A                            !1,1",A                       !2,1",C
              !1,2",B                            !1,2",B                       !3,1",D
Follower 2
              Epoch: 1                           Epoch: 2                      Epoch: 2
              #                                  !2,1",C                       !2,1",C

Follower 3                                                                     Epoch: 3
              Epoch: 1                           Epoch: 2
              #                                  !2,1",C                       !2,1",C
                                                                               !3,1",D



                                           June 2011                                            25
Notes on implementation

•   Use of TCP
    ! Ordered delivery, retransmissions, etc.

    ! Notion of session

•   Elect leader with most committed txns
    ! No follower ! leader copies

•   Recovery
    ! Last zxid is sufficient
    ! In Phase 2, leader commands to add or truncate

                               June 2011               26
Performance
Experimental setup


•   Implementation in Java
•   13 identical servers
    ! Xeon 2.50GHz, Gigabit interface, two SATA
      disks


                                   http://zookeeper.apache.org

                       June 2011                             28
Throughput
                                        Continuous saturated throughput
                        70000
                                                                         Net only
                                                                      Net + Disk
                        60000                         Net + Disk (no write cache)
                                                                          Net cap

                        50000
Operations per second




                        40000


                        30000


                        20000


                        10000


                            0
                                2   4     6           8          10           12    14
                                        Number of servers in ensemble




                                                  June 2011                              29
Latency




  June 2011   30
Wrap up
Conclusion

•   Zookeeper
    ! Multiple outstanding operations
    ! Dependencies between consecutive updates

•   Zab
    ! Primary Order Broadcast
    ! Synchronization phase
    ! Efficient recovery


                              June 2011          32
Questions?


http://zookeeper.apache.org

More Related Content

Similar to High-performance broadcast for primary-backup systems

Zararfa SummerCamp 2012 - Community update and Zarafa Development Process
Zararfa SummerCamp 2012 - Community update and Zarafa Development ProcessZararfa SummerCamp 2012 - Community update and Zarafa Development Process
Zararfa SummerCamp 2012 - Community update and Zarafa Development ProcessZarafa
 
Getting started with GIT
Getting started with GITGetting started with GIT
Getting started with GITpratz0909
 
New York Kubernetes: CI/CD Patterns for Kubernetes
New York Kubernetes: CI/CD Patterns for KubernetesNew York Kubernetes: CI/CD Patterns for Kubernetes
New York Kubernetes: CI/CD Patterns for KubernetesAndrew Phillips
 
Value-Stream-Mapping,
Value-Stream-Mapping, Value-Stream-Mapping,
Value-Stream-Mapping, Towo Toivola
 
Atril-Déjà Vu Tea mserver 2 general presentation
Atril-Déjà Vu Tea mserver 2   general presentationAtril-Déjà Vu Tea mserver 2   general presentation
Atril-Déjà Vu Tea mserver 2 general presentationcohlmann
 
New features in Pig 0.11
New features in Pig 0.11New features in Pig 0.11
New features in Pig 0.11Hortonworks
 
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12Puppet
 
AWS Customer Presentation - The Server Labs
AWS Customer Presentation - The Server Labs AWS Customer Presentation - The Server Labs
AWS Customer Presentation - The Server Labs Amazon Web Services
 
Kubernetes I Deep Dive.pptx
Kubernetes I Deep Dive.pptxKubernetes I Deep Dive.pptx
Kubernetes I Deep Dive.pptxssuser368371
 
Lean and Kanban Principles for Software Developers
Lean and Kanban Principles for Software DevelopersLean and Kanban Principles for Software Developers
Lean and Kanban Principles for Software DevelopersCory Foy
 
Is Advanced Verification for FPGA based Logic needed
Is Advanced Verification for FPGA based Logic neededIs Advanced Verification for FPGA based Logic needed
Is Advanced Verification for FPGA based Logic neededchiportal
 
Stairway to heaven webinar
Stairway to heaven webinarStairway to heaven webinar
Stairway to heaven webinarCloudBees
 
Release This! - Tools for a Smooth Release Cycle
Release This! - Tools for a Smooth Release CycleRelease This! - Tools for a Smooth Release Cycle
Release This! - Tools for a Smooth Release CyclePerforce
 
CICS TS for z/OS, From Waterfall to Agile using Rational Jazz Technology - no...
CICS TS for z/OS, From Waterfall to Agile using Rational Jazz Technology - no...CICS TS for z/OS, From Waterfall to Agile using Rational Jazz Technology - no...
CICS TS for z/OS, From Waterfall to Agile using Rational Jazz Technology - no...IBM Danmark
 
Dev Tools State of the Union (Part I) - Atlassian Summit 2010
Dev Tools State of the Union (Part I) - Atlassian Summit 2010Dev Tools State of the Union (Part I) - Atlassian Summit 2010
Dev Tools State of the Union (Part I) - Atlassian Summit 2010Atlassian
 
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...SQLExpert.pl
 
A Look at Plone 4
A Look at Plone 4A Look at Plone 4
A Look at Plone 4Eric Steele
 

Similar to High-performance broadcast for primary-backup systems (20)

Zararfa SummerCamp 2012 - Community update and Zarafa Development Process
Zararfa SummerCamp 2012 - Community update and Zarafa Development ProcessZararfa SummerCamp 2012 - Community update and Zarafa Development Process
Zararfa SummerCamp 2012 - Community update and Zarafa Development Process
 
Getting started with GIT
Getting started with GITGetting started with GIT
Getting started with GIT
 
New York Kubernetes: CI/CD Patterns for Kubernetes
New York Kubernetes: CI/CD Patterns for KubernetesNew York Kubernetes: CI/CD Patterns for Kubernetes
New York Kubernetes: CI/CD Patterns for Kubernetes
 
How to Introduce Continuous Delivery
How to Introduce Continuous DeliveryHow to Introduce Continuous Delivery
How to Introduce Continuous Delivery
 
Value-Stream-Mapping,
Value-Stream-Mapping, Value-Stream-Mapping,
Value-Stream-Mapping,
 
Atril-Déjà Vu Tea mserver 2 general presentation
Atril-Déjà Vu Tea mserver 2   general presentationAtril-Déjà Vu Tea mserver 2   general presentation
Atril-Déjà Vu Tea mserver 2 general presentation
 
New features in Pig 0.11
New features in Pig 0.11New features in Pig 0.11
New features in Pig 0.11
 
Go Training
Go TrainingGo Training
Go Training
 
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
 
Subversion last minute survival crash course
Subversion  last minute survival crash courseSubversion  last minute survival crash course
Subversion last minute survival crash course
 
AWS Customer Presentation - The Server Labs
AWS Customer Presentation - The Server Labs AWS Customer Presentation - The Server Labs
AWS Customer Presentation - The Server Labs
 
Kubernetes I Deep Dive.pptx
Kubernetes I Deep Dive.pptxKubernetes I Deep Dive.pptx
Kubernetes I Deep Dive.pptx
 
Lean and Kanban Principles for Software Developers
Lean and Kanban Principles for Software DevelopersLean and Kanban Principles for Software Developers
Lean and Kanban Principles for Software Developers
 
Is Advanced Verification for FPGA based Logic needed
Is Advanced Verification for FPGA based Logic neededIs Advanced Verification for FPGA based Logic needed
Is Advanced Verification for FPGA based Logic needed
 
Stairway to heaven webinar
Stairway to heaven webinarStairway to heaven webinar
Stairway to heaven webinar
 
Release This! - Tools for a Smooth Release Cycle
Release This! - Tools for a Smooth Release CycleRelease This! - Tools for a Smooth Release Cycle
Release This! - Tools for a Smooth Release Cycle
 
CICS TS for z/OS, From Waterfall to Agile using Rational Jazz Technology - no...
CICS TS for z/OS, From Waterfall to Agile using Rational Jazz Technology - no...CICS TS for z/OS, From Waterfall to Agile using Rational Jazz Technology - no...
CICS TS for z/OS, From Waterfall to Agile using Rational Jazz Technology - no...
 
Dev Tools State of the Union (Part I) - Atlassian Summit 2010
Dev Tools State of the Union (Part I) - Atlassian Summit 2010Dev Tools State of the Union (Part I) - Atlassian Summit 2010
Dev Tools State of the Union (Part I) - Atlassian Summit 2010
 
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
 
A Look at Plone 4
A Look at Plone 4A Look at Plone 4
A Look at Plone 4
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

High-performance broadcast for primary-backup systems

  • 1. Zab: High-performance broadcast for primary-backup systems Flavio Junqueira, Benjamin Reed, Marco Serafini Yahoo! Research June 2011
  • 2. Setting up the stage • Background: ZooKeeper • Coordination service ! Web-scale applications ! Intensive use (high performance) ! Source of truth for many applications June 2011 2
  • 3. ZooKeeper • Open source Apache project • Used in production ! Yahoo! ! Facebook ! Rackspace ! ... http://zookeeper.apache.org June 2011 3
  • 4. ZooKeeper • ... is a leader-based, replicated service ! Processes crash and recover • Leader ! Executes requests Leader Follower Follower ! Propagates state updates Broadcast Deliver Deliver • Follower Atomic broadcast ! Applies state updates June 2011 4
  • 5. ZooKeeper • Client Client ! Submits operations to a server Request ! If follower, forwards to Leader Follower Follower leader Broadcast Deliver Deliver ! Leader executes and propagates state update Atomic broadcast June 2011 5
  • 6. ZooKeeper • State updates ! All followers apply the same updates ! All followers apply them in the same order ! Atomic broadcast • Performance requirements ! Multiple outstanding operations ! Low latency and high throughput June 2011 6
  • 7. ZooKeeper • Update configuration and create ready • If ready exists, then configuration is consistent setData del setData /cfg/client /cfg/ready /cfg/server create B B Follower /cfg/ready Leader create /cfg/ready setData Follower /cfg/server setData B /cfg/client del B /cfg/ready • If 1 doesn’t commit, then 2+3 can’t • If 2+3 don’t commit, then 4 must not commit commit June 2011 7
  • 8. ZooKeeper • Exploring Paxos ! Efficient consensus protocol ! State-machine replication ! Multiple consecutive instances • Why is it not suitable out of the box? ! Does not guarantee order ! Multiple outstanding operations June 2011 8
  • 9. Paxos at a glance 1b: Acceptor promises 2b: If quorum, value not to accept lower is chosen ballots Acceptor + Learner 1a 1b 2a 2b 3a Acceptor + Proposer + Learner 1a 1b 2a 2b 3a Acceptor + Learner Phase 1: Phase 2: Phase 3: Selects Proposes Value value to a value learned propose June 2011 9
  • 10. Paxos run Interleaves operations of P1, 27: <1a,3> 27: <2a, 3, C> P2, and and P3 28: <1a,3> 28: <2a, 3, B> 29: <1a,3> 29: <2a, 3, D> P3 Has accepted A and B from P1 A1 27: <1, A> 27: <1b, 1, A> 28: <1, B> 28: <1b, 1, B> 29: <1b, _, _> A2 Has 27: <3, C> 27: <2, C> accepted C 28: <3, B> from P2 29: <3, D> A3 27: <2, C> 27: <1b, 2, C> 27: <3, C> 28: <1b, _, _> 28: <3, B> 29: <1b, _, _> 29: <3, D> June 2011 10
  • 11. ZooKeeper • Another requirement ! Minimize downtime ! Efficient recovery • Reduce the amount of state transfered • Zab ! One identifier ! Missing values for each process June 2011 11
  • 12. Zab and PO Broadcast
  • 13. Definitions • Processes: Lead or Follow • Followers ! Maintain a history of transactions (updates) • Transaction identifiers: !e,c" ! e : epoch number of the leader ! c : epoch counter June 2011 13
  • 14. Properties of PO Broadcast • Integrity ! Only broadcast transactions are delivered ! Leader recovers before broadcasting new transactions • Total order and agreement ! Followers deliver the same transactions and in the same order June 2011 14
  • 15. Primary order • Local: Transactions of a leader accepted in order • Global: Transactions in history respect the order of epochs June 2011 15
  • 16. Primary order • Local: Transactions of a primary accepted in order • Global: Transactions in history respect the order of epochs abcast(!e,10") abcast(!e,11") abcast(!e,12") Leader Follower June 2011 16
  • 17. Primary order • Local: Transactions of a primary accepted in order • Global: Transactions in history respect the order of epochs abcast(!e,10") abcast(!e,11") abcast(!e,12") Leader Follower June 2011 17
  • 18. Primary order • Local: Transactions of a primary accepted in order • Global: Transactions in history respect the order of epochs abcast(!e,10") abcast(!e,11") Leader abcast(!e’,1") Leader’ Follower June 2011 18
  • 19. Primary order • Local: Transactions of a primary accepted in order • Global: Transactions in history respect the order of epochs abcast(!e,10") abcast(!e,11") Leader abcast(!e’,1") Leader’ Follower June 2011 19
  • 20. Zab in Phases • Phase 0 - Leader election ! Prospective leader elected • Phase 1- Discovery ! Followers promise not to go back to previous epochs ! Followers send to their last epoch and history ! selects longest history of latest epoch June 2011 20
  • 21. Zab in Phases • Phase 2 - Synchronization ! sends new history to followers ! Followers confirm leadership • Phase 3 - Broadcast ! proposes new transactions ! commits if quorum acknowledges June 2011 21
  • 22. Zab in Phases • Phases 1 and 2: Recovery ! Critical to guarantee order with multiple outstanding transactions • Phase 3: Broadcast ! Just like Phases 2 and 3 of Paxos June 2011 22
  • 23. Zab: Sample run f1 f2 f3 !0,1" !0,1" !0,1" !0,2" !0,2" !0,3" New epoch f1.a = 0, f2.a = 0, f3.a = 0, !0,3" !0,2" !0,1" Initial history of new epoch June 2011 23
  • 24. Zab: Sample run f1 f2 f3 !0,1" !0,1" !0,1" !0,2" !0,2" !0,2" Chosen! !1,1" !1,1" !1,2" New epoch f1.a = 1, f2.a = 1, f3.a = 2, !1,2" !1,1" !0,2" Can’t happen! June 2011 24
  • 25. Paxos run (revisited) Epoch 1, Phase 3 Epoch 2, Phase 3 Epoch 3, Phase 3 L1 History: # Phases 1 L2 History: # Phases 1 L3 History: !2,1",C and 2 and 2 of Epoch 2 of Epoch 3 Follower 1 Epoch: 1 Epoch: 1 Epoch: 3 !1,1",A !1,1",A !2,1",C !1,2",B !1,2",B !3,1",D Follower 2 Epoch: 1 Epoch: 2 Epoch: 2 # !2,1",C !2,1",C Follower 3 Epoch: 3 Epoch: 1 Epoch: 2 # !2,1",C !2,1",C !3,1",D June 2011 25
  • 26. Notes on implementation • Use of TCP ! Ordered delivery, retransmissions, etc. ! Notion of session • Elect leader with most committed txns ! No follower ! leader copies • Recovery ! Last zxid is sufficient ! In Phase 2, leader commands to add or truncate June 2011 26
  • 28. Experimental setup • Implementation in Java • 13 identical servers ! Xeon 2.50GHz, Gigabit interface, two SATA disks http://zookeeper.apache.org June 2011 28
  • 29. Throughput Continuous saturated throughput 70000 Net only Net + Disk 60000 Net + Disk (no write cache) Net cap 50000 Operations per second 40000 30000 20000 10000 0 2 4 6 8 10 12 14 Number of servers in ensemble June 2011 29
  • 30. Latency June 2011 30
  • 32. Conclusion • Zookeeper ! Multiple outstanding operations ! Dependencies between consecutive updates • Zab ! Primary Order Broadcast ! Synchronization phase ! Efficient recovery June 2011 32