Introduction to Bizur consensus algorithm

Introduction to Bizur
Akira Hayakawa (Github: akiradeveloper)

What’s Bizur?
● A consensus algorithm Elastifile [1] invented to develop their distributed file
system
● Aim to solve problems regarding Paxos-like log-based algorithms
● As of 14 Feb. 2017 it’s only preprint available in Arxiv [2]
● Blog posts are also available [3, 4]

The drawback of Paxos-like algorithms
● a) Reading an object requires having all related preceding entries available
● b) A slow operation will affect unrelated succeeding operations
● c) Time to detect a failure is longer
● d) Once a failure is detected, recovery can take a long time
● e) It requires handling the complex flow of log-compaction

Bizur against Paxos-like algorithms
● Paxos-like algorithms tries to solve a more general problem. Thus produces
the drawbacks a) - e)
● Bizur, on the other hand, is optimized for a specific use-case: a
strongly-consistent distributed key-value store
● With Paxos-like algorithms, it can be thought of as if each key has its own
distributed log consensus algorithm. However, that would be very inefficient

Data structures
● Bucket: where k-v pairs are packed
○ hash function :: k -> bucket_index
○ Buckets are all independent but operations on the same bucket are serialized (e.g. protected
by mutex)
○ Version = (elect_id, counter). Similar to Raft’s (term, index) pair
○ Each bucket can have different leadership. For example, the leader of bucket 0 can be Node A
and the leader of bucket 1 can be Node B
■ elect_id = the term that leader claims
■ voted_elect_id = the term each node knows
○ Leader election and client interaction are the same as Raft
● Node = [Bucket]

API
● READ, WRITE are majority-aware
operations
● Operations decode, encode_xxx
are to read/write a local bucket
● READ is required preceding to
updates (set or delete)

WRITE
● Update the bucket’s version
● Succeed only if acked by majority
● Leader
○ Claim the bucket’s elect_id is its
elect_id
● Replica
○ Nack if the claimed elect_id is older
than the voted_elect_id it knows
○ Otherwise replace the local bucket and
ack

READ
● Return the local bucket if majority
admit the bucket is the latest
● Processing ENSURERECOVERY
in prior can keep the bucket
up-to-date

ENSURERECOVERY
● When the local bucket is older than
the elect_id (current term) it should
repair the local bucket
● Processed in case of leader
change
● Use REPLICAREAD and take the
most up-to-date replica of the
bucket
○ Reading from majority guarantees that
at least one of the replicas are
up-to-date

Extension: Shard
● Intermediate data structure between Node and Bucket
○ Node = N * Shard, Shard = M * Bucket
○ N: static (e.g. 256)
○ M: dynamic (e.g. 100k)
● Introduced as an optimization
○ Theoretically, we can maintain leadership on bucket basis but that’s too expensive
○ Instead, we can lift it up to shard level so the buckets in shard can share that: elect_id and
voted_elect_id can be maintain in the shard level and buckets only refer to it

Configuration change
● Based on SMART [5]
● Configuration
○ The cluster membership (X -> Y)
○ M (#buckets in a shard)
● High level description
○ 1) Create a new Bizur instance running
on membership Y
○ 2) Notifies the old instance to return
RECONFIGERROR to all requests
○ 3) On the error, client can access to the
new instance
○ 4) New instance migrates buckets from
the old instance
Bizur
Bizur

Bucket migration on changing M
● k-v pairs increase and the buckets
become huge. It’s time to Increase
M
● The easiest way is to increase it in
power of two. That way, we can
maintain the bucket boundaries
● It’s something like caching where
the old instance is backing storeold
new
node
shard
bucket

Links
● [1] Elastifile
● [2] Bizur: A Key-Value Consensus Algorithm for Scalable File-systems
● [3] Bizur: A New Key-value Consensus Algorithm
● [4] Log-less Consensus
● [5] The SMART way to migrate replicated stateful services

Introduction to Bizur consensus algorithm

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Introduction to Bizur consensus algorithm

Similar to Introduction to Bizur consensus algorithm (20)

Recently uploaded

Recently uploaded (20)

Introduction to Bizur consensus algorithm