Consensus Algorithms:
An Introduction & Analysis
Zak Cole
CTO, Whiteblock
Consensus
• “General agreement”
• Multiple nodes within a distributed system need to work together and store the
same data.
• Every correct process/node needs to agree on the same value of this data.
• Consensus occurs when all nodes come to an agreement concerning state.
• Consensus must be achieved whether or not these nodes are faulty/corrupt.
• A consensus algorithm is the process by which nodes arrive at consensus.
• Consensus is a fundamental issue in distributed systems.
Distributed Systems
• A group of independent processes or nodes.
• Nodes pass messages to one another and coordinate.
• Although separate from one another, appear as a single process.
• Used within large scale services like Google, Facebook, or AWS.
• The Internet, Blockchain, DNS, BitTorrent, SMTP
• Distributed systems can be more scalable.
• More fault tolerant, but architecture is much more complex.
Distributed System Models
• Two types of system models
• Synchronous
• Multiprocessing computers, VLSI chips with thousands of transistors, etc.
• All processes run using same clock.
• Defined bounds for clock of each process.
• Consensus is solvable.
• Asynchronous
• Blockchain, Internet, sensor networks, etc.
• All processes rely on their own independent clock, so drift rate doesn’t matter as much.
• Undefined bounds for process execution.
• Consensus is impossible to solve.
Process A
Waits for response Continues working
Gets response
Process B Process A Process B
Synchronous Asynchronous
FLP Impossibility
• Asynchronous consensus is impossible to solve.
• Fischer, Lynch, and Patterson formalized this notion in 1985.
• Impossibility of Distributed Consensus with One Faulty Process
• Impossible to tell the difference between a faulty process and one that is just
taking a really long time to complete.
• In an asynchronous system, there is no distributed algorithm that solves for this
in a deterministic manner.
Consensus Algorithms
• Each process P starts with an initial input value x of either 0 or 1.
• Each P finishes with an output value y of either 0 or 1.
• Different x values result in different y values.
• Each P proposes a value.
• Once value y is proposed, it can’t be changed.
• Consensus algorithms should result in each P presenting the same value y.
Byzantine Fault Tolerance
• In spite of FLP impossibility, we don’t really need to care about absolute
certainty if something can be determined statistically probable with 99.999%
certainty.
• A system that is resilient under the presence of a certain number of failing
components or broadcasting incorrect information is described as Byzantine
Fault Tolerant.
• A node in system exhibiting adversarial behavior is considered Byzantine.
• BFT systems are unaffected to these sorts of failures so long as 2/3 + 1 of the
nodes aren’t Byzantine.
Paxos
• Proposed by Leslie Lamport in 1989.
• Finally published as journal article entitled The Part Time Parliament in 1998.
• Assumes a partially synchronous system.
• Provides safety and eventual liveness.
• Safety means that consensus is not violated.
• Eventual liveness means that if everything within the system operates according
to plan, consensus might be reached.
• FLP impossibility still applies and Paxos does not offer Byzantine fault
tolerance.
How Paxos Works
• Nodes fill 3 roles:
• Proposers - Propose some sort of value. Issue prepare requests to acceptors and wait.
• Acceptors – Check whether or not prepare request has already been made.
• Learners - find out that a proposal has been accepted by a majority of acceptors.
• Operates in asynchronous rounds.
• Each round is assigned a ballot ID.
• Rounds consist of 3 phases:
• Phase 1 – Election
• Phase 2 - Bill
• Phase 3 - Law
Practical Byzantine Fault Tolerance
• Presented by Miguel Castro and Barbara Liskov in 1999.
• Assumes the presence of Byzantine nodes.
• Amount of malicious nodes in network must be less than or equal to 1/3 of total
number of nodes (n).
• Provides both liveness and safety as long as there exists no more than n-1/3
malicious nodes simultaneously.
• The higher the number of nodes in the system, the less likely these conditions
will present.
How pBFT Works
• All nodes are ordered in sequence with one node being the leader.
• Nodes that aren’t the leader are referred to as backup nodes.
• Nodes must validate that messages originated from specific peers nodes and that
message was not altered or manipulated in transit.
• Each round (view) operates in 4 phases:
1. Client sends request to start an operation to leader.
2. Leader broadcasts request to backup nodes.
3. Nodes execute request and send response to client.
4. Client waits to receive f + 1 replies of the same result.
• Leader is changed in a round robin manner during every view.
• Leader can be replaced if dormant for period of time.
Public Blockchain Networks
• Anyone is free to participate in a public blockchain network.
• Degree of participation needs to be dictated by mechanisms that can ensure
security and present some degree of Sybil resistance.
• PoW and PoS are more like Sybil control mechanism to replace permissioned
membership with “other ways of entering the system explicitly (stake) or
opaquely/probabilistically (work).”
• These protocols are often paired with existing algorithms like pBFT to establish
consensus.
• This helps establish security and “eliminate” the need for “trust” in public
networks.
Proof of Work
• Basis for Nakamoto style consensus presented in Bitcoin.
• Brute force SHA-256 hashing functions to identify a nonce that produces a
results satisfying the difficulty threshold established by the network.
• The process of performing these math functions is referred to as mining.
• Mining is computationally exhaustive and incurs a high degree of cost.
• This high cost can be worth it because of the possibility of reward.
• The application of game theory implies an incentive to deter bad actors.
• Following the rules can result in a high reward.
• Breaking the rules comes at a high cost that makes cheating impractical.
• It pays more to contribute to the security of the network than it does to cheat.
Proof of Stake
• Relies on set of validators to propose and vote on blocks.
• In order to participate in the network as a validator, a node must transfer a
certain amount of assets or tokens in a transaction which is locked in a deposit.
• The weight of the validators vote is determined by the size of the stake.
• There are different implementations of PoS and a variety of systems which
currently implement some variation of this consensus mechanism.
• Existing systems include Cosmos, Tendermint, Dash, Pivx, Qtum, etc.
• Variations of PoS include Delegated Proof of Stake (dPoS) as implemented in
systems like EOS.
• Ethereum’s PoS implementation (Casper) is currently in development.
Private/Permissioned Blockchain Networks
• A certain degree of inherent security and control can be assumed.
• Eliminates costly overhead of public consensus algorithms which require more
robust security mechanisms.
• Performance can be optimized according to use case.
• PoA based consensus algorithms present a desirable degree of performance and
scalability.
• Maintain an essential level of security required by business-critical networks
without sacrificing the operational benefits of blockchain systems.
Proof of Authority
• PoA is a family of consensus algorithms used within permissioned blockchain
systems.
• Compromise between decentralized, public consensus and more efficient
centralized system models.
• Ethereum implements two PoA engines called Clique and Aura.
• Relies on a set of authorities, or trusted nodes, identified by a unique ID. ‘
• At least N/2 + 1 authorities are assumed to be honest.
• A mining rotation schema fairly distributes block creation among authorities.
• Blocks may only be minted by trusted signers.
Istanbul BFT Consensus (IBFT)
• Similar to Clique
• Presents optimizations to pBFT to accommodate for blockchain systems.
• Can be implemented within Quorum.
• Presented in 2017 within EIP-650.
• Offers instant finality with higher throughput.
• Blocks are final and self-verifiable.
• Each validator maintains a state machine replica to reach consensus.
• Can tolerate F faulty nodes in a N validator nodes network where N = 3F + 1.
How IBFT Works
• Before each round, a validator is picked as proposer by group according to two
policies.
• Round robin – Proposer changes with every block and round change.
• Sticky proposer – Proposer changes when a round change occurs.
• Proposer broadcasts a new block proposal along with PRE-PREPARE message.
• Validators receive message, enter PRE-PREPARED state, then broadcast
PREPARE message.
• When 2F + 1 PREPARE messages are received, validator enters PREPARED
state, then broadcasts COMMIT message.
• Validators wait for 2F + 1 COMMIT messages to enter COMMITED state and
then insert the block to the chain.
Raft
• Also offered by Qurorum.
• Equivalent to Paxos in fault-tolerance and performance.
• Difference is that it consists of independent subproblems.
• Meant to be a simpler alternative.
• A node in a raft cluster is either a leader or a follower.
• The leader is responsible for log replication to the followers.
• Regularly informs the followers of its existence by sending heartbeat message.
Clique
• Implemented by Geth client.
• Adheres to the rules of the Ethereum protocol.
• Based on epochs which are defined by a set number of blocks.
• Each epoch starts with the broadcast of a transition block which refers to the set
of authorities, referred to as signers, for that particular epoch.
• To authorize a block, signer signs block’s hash using secp256k1 curve and the
resulting 65 byte signature is embedded into the extraData as the trailing 65 byte
suffix.
• As long as these rules are followed, signers can sign blocks as they see fit.
• Relies on scoring mechanism, so GHOST protocol resolves forks.
Aura
• Authority Round, implemented by Parity.
• Capable of tolerating up to 50% of malicious nodes.
• Network is assumed to be synchronous.
• Requires a group of strictly specified validators to act as authorities who are
allowed to seal blocks.
• Authority sets are defined by an array of accounts.
• Blocks are sealed in rounds that consist of a number of steps.
• Uses scoring system.
How Aura Works
• Each authority is assigned a time slot where they can release a block.
• Time slots are determined by the system clock of each validator.
• All authorities synchronized within the same UNIX time t.
• Finality is established based on by majority vote.
• Each authority seals one block in each step per round.
• Steps occur chronologically, so block sealing relies on block step order of all
authorities.
How Raft Works
• Imposes a 1:1 correspondence between Raft and Ethereum nodes.
• Each Ethereum node is also a Raft node.
• Raft nodes consist of leaders and followers.
• Ethereum nodes consist of miners and verifiers.
• Transaction is submitted to a node in the network and broadcast to all nodes.
• Once it reaches minter, next block is created.
• Block is written to database and considered the new head of the chain once the
block has flown through Raft.
• All nodes extend chain in lock-step and update logs.
Conclusion
• There are a variety of flavors for consensus that can be implemented in Etherum.
• Each should be reviewed with use case in mind.
• For business-critical operations within a private or permissioned network, PoA
is highly scalable, offers an adequate amount of security, and can provide
significant levels of throughput.
• New implementations are currently being proposed; Pantheon from PegaSys.
• This presentation is the first in a series. Next month, Whiteblock and PegaSys
will be presenting a practical demonstration of several consensus algorithms.
• To sign up for part 2, send an email to zak@whiteblock.io.
Questions?
Zak Cole
www.whiteblock.io
Email: zak@whiteblock.io
Twitter: @0xzak
LinkedIn: www.linkedin.com/in/zak-cole

Consensus Algorithms: An Introduction & Analysis

  • 2.
    Consensus Algorithms: An Introduction& Analysis Zak Cole CTO, Whiteblock
  • 3.
    Consensus • “General agreement” •Multiple nodes within a distributed system need to work together and store the same data. • Every correct process/node needs to agree on the same value of this data. • Consensus occurs when all nodes come to an agreement concerning state. • Consensus must be achieved whether or not these nodes are faulty/corrupt. • A consensus algorithm is the process by which nodes arrive at consensus. • Consensus is a fundamental issue in distributed systems.
  • 4.
    Distributed Systems • Agroup of independent processes or nodes. • Nodes pass messages to one another and coordinate. • Although separate from one another, appear as a single process. • Used within large scale services like Google, Facebook, or AWS. • The Internet, Blockchain, DNS, BitTorrent, SMTP • Distributed systems can be more scalable. • More fault tolerant, but architecture is much more complex.
  • 17.
    Distributed System Models •Two types of system models • Synchronous • Multiprocessing computers, VLSI chips with thousands of transistors, etc. • All processes run using same clock. • Defined bounds for clock of each process. • Consensus is solvable. • Asynchronous • Blockchain, Internet, sensor networks, etc. • All processes rely on their own independent clock, so drift rate doesn’t matter as much. • Undefined bounds for process execution. • Consensus is impossible to solve.
  • 18.
    Process A Waits forresponse Continues working Gets response Process B Process A Process B Synchronous Asynchronous
  • 19.
    FLP Impossibility • Asynchronousconsensus is impossible to solve. • Fischer, Lynch, and Patterson formalized this notion in 1985. • Impossibility of Distributed Consensus with One Faulty Process • Impossible to tell the difference between a faulty process and one that is just taking a really long time to complete. • In an asynchronous system, there is no distributed algorithm that solves for this in a deterministic manner.
  • 20.
    Consensus Algorithms • Eachprocess P starts with an initial input value x of either 0 or 1. • Each P finishes with an output value y of either 0 or 1. • Different x values result in different y values. • Each P proposes a value. • Once value y is proposed, it can’t be changed. • Consensus algorithms should result in each P presenting the same value y.
  • 21.
    Byzantine Fault Tolerance •In spite of FLP impossibility, we don’t really need to care about absolute certainty if something can be determined statistically probable with 99.999% certainty. • A system that is resilient under the presence of a certain number of failing components or broadcasting incorrect information is described as Byzantine Fault Tolerant. • A node in system exhibiting adversarial behavior is considered Byzantine. • BFT systems are unaffected to these sorts of failures so long as 2/3 + 1 of the nodes aren’t Byzantine.
  • 23.
    Paxos • Proposed byLeslie Lamport in 1989. • Finally published as journal article entitled The Part Time Parliament in 1998. • Assumes a partially synchronous system. • Provides safety and eventual liveness. • Safety means that consensus is not violated. • Eventual liveness means that if everything within the system operates according to plan, consensus might be reached. • FLP impossibility still applies and Paxos does not offer Byzantine fault tolerance.
  • 24.
    How Paxos Works •Nodes fill 3 roles: • Proposers - Propose some sort of value. Issue prepare requests to acceptors and wait. • Acceptors – Check whether or not prepare request has already been made. • Learners - find out that a proposal has been accepted by a majority of acceptors. • Operates in asynchronous rounds. • Each round is assigned a ballot ID. • Rounds consist of 3 phases: • Phase 1 – Election • Phase 2 - Bill • Phase 3 - Law
  • 25.
    Practical Byzantine FaultTolerance • Presented by Miguel Castro and Barbara Liskov in 1999. • Assumes the presence of Byzantine nodes. • Amount of malicious nodes in network must be less than or equal to 1/3 of total number of nodes (n). • Provides both liveness and safety as long as there exists no more than n-1/3 malicious nodes simultaneously. • The higher the number of nodes in the system, the less likely these conditions will present.
  • 26.
    How pBFT Works •All nodes are ordered in sequence with one node being the leader. • Nodes that aren’t the leader are referred to as backup nodes. • Nodes must validate that messages originated from specific peers nodes and that message was not altered or manipulated in transit. • Each round (view) operates in 4 phases: 1. Client sends request to start an operation to leader. 2. Leader broadcasts request to backup nodes. 3. Nodes execute request and send response to client. 4. Client waits to receive f + 1 replies of the same result. • Leader is changed in a round robin manner during every view. • Leader can be replaced if dormant for period of time.
  • 28.
    Public Blockchain Networks •Anyone is free to participate in a public blockchain network. • Degree of participation needs to be dictated by mechanisms that can ensure security and present some degree of Sybil resistance. • PoW and PoS are more like Sybil control mechanism to replace permissioned membership with “other ways of entering the system explicitly (stake) or opaquely/probabilistically (work).” • These protocols are often paired with existing algorithms like pBFT to establish consensus. • This helps establish security and “eliminate” the need for “trust” in public networks.
  • 29.
    Proof of Work •Basis for Nakamoto style consensus presented in Bitcoin. • Brute force SHA-256 hashing functions to identify a nonce that produces a results satisfying the difficulty threshold established by the network. • The process of performing these math functions is referred to as mining. • Mining is computationally exhaustive and incurs a high degree of cost. • This high cost can be worth it because of the possibility of reward. • The application of game theory implies an incentive to deter bad actors. • Following the rules can result in a high reward. • Breaking the rules comes at a high cost that makes cheating impractical. • It pays more to contribute to the security of the network than it does to cheat.
  • 30.
    Proof of Stake •Relies on set of validators to propose and vote on blocks. • In order to participate in the network as a validator, a node must transfer a certain amount of assets or tokens in a transaction which is locked in a deposit. • The weight of the validators vote is determined by the size of the stake. • There are different implementations of PoS and a variety of systems which currently implement some variation of this consensus mechanism. • Existing systems include Cosmos, Tendermint, Dash, Pivx, Qtum, etc. • Variations of PoS include Delegated Proof of Stake (dPoS) as implemented in systems like EOS. • Ethereum’s PoS implementation (Casper) is currently in development.
  • 31.
    Private/Permissioned Blockchain Networks •A certain degree of inherent security and control can be assumed. • Eliminates costly overhead of public consensus algorithms which require more robust security mechanisms. • Performance can be optimized according to use case. • PoA based consensus algorithms present a desirable degree of performance and scalability. • Maintain an essential level of security required by business-critical networks without sacrificing the operational benefits of blockchain systems.
  • 32.
    Proof of Authority •PoA is a family of consensus algorithms used within permissioned blockchain systems. • Compromise between decentralized, public consensus and more efficient centralized system models. • Ethereum implements two PoA engines called Clique and Aura. • Relies on a set of authorities, or trusted nodes, identified by a unique ID. ‘ • At least N/2 + 1 authorities are assumed to be honest. • A mining rotation schema fairly distributes block creation among authorities. • Blocks may only be minted by trusted signers.
  • 33.
    Istanbul BFT Consensus(IBFT) • Similar to Clique • Presents optimizations to pBFT to accommodate for blockchain systems. • Can be implemented within Quorum. • Presented in 2017 within EIP-650. • Offers instant finality with higher throughput. • Blocks are final and self-verifiable. • Each validator maintains a state machine replica to reach consensus. • Can tolerate F faulty nodes in a N validator nodes network where N = 3F + 1.
  • 34.
    How IBFT Works •Before each round, a validator is picked as proposer by group according to two policies. • Round robin – Proposer changes with every block and round change. • Sticky proposer – Proposer changes when a round change occurs. • Proposer broadcasts a new block proposal along with PRE-PREPARE message. • Validators receive message, enter PRE-PREPARED state, then broadcast PREPARE message. • When 2F + 1 PREPARE messages are received, validator enters PREPARED state, then broadcasts COMMIT message. • Validators wait for 2F + 1 COMMIT messages to enter COMMITED state and then insert the block to the chain.
  • 35.
    Raft • Also offeredby Qurorum. • Equivalent to Paxos in fault-tolerance and performance. • Difference is that it consists of independent subproblems. • Meant to be a simpler alternative. • A node in a raft cluster is either a leader or a follower. • The leader is responsible for log replication to the followers. • Regularly informs the followers of its existence by sending heartbeat message.
  • 36.
    Clique • Implemented byGeth client. • Adheres to the rules of the Ethereum protocol. • Based on epochs which are defined by a set number of blocks. • Each epoch starts with the broadcast of a transition block which refers to the set of authorities, referred to as signers, for that particular epoch. • To authorize a block, signer signs block’s hash using secp256k1 curve and the resulting 65 byte signature is embedded into the extraData as the trailing 65 byte suffix. • As long as these rules are followed, signers can sign blocks as they see fit. • Relies on scoring mechanism, so GHOST protocol resolves forks.
  • 37.
    Aura • Authority Round,implemented by Parity. • Capable of tolerating up to 50% of malicious nodes. • Network is assumed to be synchronous. • Requires a group of strictly specified validators to act as authorities who are allowed to seal blocks. • Authority sets are defined by an array of accounts. • Blocks are sealed in rounds that consist of a number of steps. • Uses scoring system.
  • 38.
    How Aura Works •Each authority is assigned a time slot where they can release a block. • Time slots are determined by the system clock of each validator. • All authorities synchronized within the same UNIX time t. • Finality is established based on by majority vote. • Each authority seals one block in each step per round. • Steps occur chronologically, so block sealing relies on block step order of all authorities.
  • 39.
    How Raft Works •Imposes a 1:1 correspondence between Raft and Ethereum nodes. • Each Ethereum node is also a Raft node. • Raft nodes consist of leaders and followers. • Ethereum nodes consist of miners and verifiers. • Transaction is submitted to a node in the network and broadcast to all nodes. • Once it reaches minter, next block is created. • Block is written to database and considered the new head of the chain once the block has flown through Raft. • All nodes extend chain in lock-step and update logs.
  • 40.
    Conclusion • There area variety of flavors for consensus that can be implemented in Etherum. • Each should be reviewed with use case in mind. • For business-critical operations within a private or permissioned network, PoA is highly scalable, offers an adequate amount of security, and can provide significant levels of throughput. • New implementations are currently being proposed; Pantheon from PegaSys. • This presentation is the first in a series. Next month, Whiteblock and PegaSys will be presenting a practical demonstration of several consensus algorithms. • To sign up for part 2, send an email to zak@whiteblock.io.
  • 41.