SlideShare a Scribd company logo
1 of 13
Download to read offline
Introduction to Bizur
Akira Hayakawa (Github: akiradeveloper)
What’s Bizur?
● A consensus algorithm Elastifile [1] invented to develop their distributed file
system
● Aim to solve problems regarding Paxos-like log-based algorithms
● As of 14 Feb. 2017 it’s only preprint available in Arxiv [2]
● Blog posts are also available [3, 4]
The drawback of Paxos-like algorithms
● a) Reading an object requires having all related preceding entries available
● b) A slow operation will affect unrelated succeeding operations
● c) Time to detect a failure is longer
● d) Once a failure is detected, recovery can take a long time
● e) It requires handling the complex flow of log-compaction
Bizur against Paxos-like algorithms
● Paxos-like algorithms tries to solve a more general problem. Thus produces
the drawbacks a) - e)
● Bizur, on the other hand, is optimized for a specific use-case: a
strongly-consistent distributed key-value store
● With Paxos-like algorithms, it can be thought of as if each key has its own
distributed log consensus algorithm. However, that would be very inefficient
Data structures
● Bucket: where k-v pairs are packed
○ hash function :: k -> bucket_index
○ Buckets are all independent but operations on the same bucket are serialized (e.g. protected
by mutex)
○ Version = (elect_id, counter). Similar to Raft’s (term, index) pair
○ Each bucket can have different leadership. For example, the leader of bucket 0 can be Node A
and the leader of bucket 1 can be Node B
■ elect_id = the term that leader claims
■ voted_elect_id = the term each node knows
○ Leader election and client interaction are the same as Raft
● Node = [Bucket]
API
● READ, WRITE are majority-aware
operations
● Operations decode, encode_xxx
are to read/write a local bucket
● READ is required preceding to
updates (set or delete)
WRITE
● Update the bucket’s version
● Succeed only if acked by majority
● Leader
○ Claim the bucket’s elect_id is its
elect_id
● Replica
○ Nack if the claimed elect_id is older
than the voted_elect_id it knows
○ Otherwise replace the local bucket and
ack
READ
● Return the local bucket if majority
admit the bucket is the latest
● Processing ENSURERECOVERY
in prior can keep the bucket
up-to-date
ENSURERECOVERY
● When the local bucket is older than
the elect_id (current term) it should
repair the local bucket
● Processed in case of leader
change
● Use REPLICAREAD and take the
most up-to-date replica of the
bucket
○ Reading from majority guarantees that
at least one of the replicas are
up-to-date
Extension: Shard
● Intermediate data structure between Node and Bucket
○ Node = N * Shard, Shard = M * Bucket
○ N: static (e.g. 256)
○ M: dynamic (e.g. 100k)
● Introduced as an optimization
○ Theoretically, we can maintain leadership on bucket basis but that’s too expensive
○ Instead, we can lift it up to shard level so the buckets in shard can share that: elect_id and
voted_elect_id can be maintain in the shard level and buckets only refer to it
Configuration change
● Based on SMART [5]
● Configuration
○ The cluster membership (X -> Y)
○ M (#buckets in a shard)
● High level description
○ 1) Create a new Bizur instance running
on membership Y
○ 2) Notifies the old instance to return
RECONFIGERROR to all requests
○ 3) On the error, client can access to the
new instance
○ 4) New instance migrates buckets from
the old instance
Bizur
Bizur
Bucket migration on changing M
● k-v pairs increase and the buckets
become huge. It’s time to Increase
M
● The easiest way is to increase it in
power of two. That way, we can
maintain the bucket boundaries
● It’s something like caching where
the old instance is backing storeold
new
node
shard
bucket
Links
● [1] Elastifile
● [2] Bizur: A Key-Value Consensus Algorithm for Scalable File-systems
● [3] Bizur: A New Key-value Consensus Algorithm
● [4] Log-less Consensus
● [5] The SMART way to migrate replicated stateful services

More Related Content

What's hot

Container Listing Update (Liberty Swift Design Summit)
Container Listing Update (Liberty Swift Design Summit)Container Listing Update (Liberty Swift Design Summit)
Container Listing Update (Liberty Swift Design Summit)Kota Tsuyuzaki
 
Rook: Storage for Containers in Containers – data://disrupted® 2020
Rook: Storage for Containers in Containers  – data://disrupted® 2020Rook: Storage for Containers in Containers  – data://disrupted® 2020
Rook: Storage for Containers in Containers – data://disrupted® 2020data://disrupted®
 
Arbiter volumes in gluster
Arbiter volumes in glusterArbiter volumes in gluster
Arbiter volumes in glusteritisravi
 
Monitoring your shiny new docker environment
Monitoring your shiny new docker environmentMonitoring your shiny new docker environment
Monitoring your shiny new docker environmentSamuel Vandamme
 
Cassandra 2.1 boot camp, exercise
Cassandra 2.1 boot camp, exerciseCassandra 2.1 boot camp, exercise
Cassandra 2.1 boot camp, exerciseJoshua McKenzie
 
OpenShift.io on Gluster
OpenShift.io on GlusterOpenShift.io on Gluster
OpenShift.io on Glustermountpoint.io
 
Seastar @ NYCC++UG
Seastar @ NYCC++UGSeastar @ NYCC++UG
Seastar @ NYCC++UGAvi Kivity
 
Leveraging AWS
Leveraging AWSLeveraging AWS
Leveraging AWSForum One
 
Cloud Firestore – From JSON Deserialization to Object Document Mapping (ODM)
Cloud Firestore – From JSON Deserialization to Object Document Mapping (ODM)Cloud Firestore – From JSON Deserialization to Object Document Mapping (ODM)
Cloud Firestore – From JSON Deserialization to Object Document Mapping (ODM)Minh Dao
 
Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseEric Evans
 
Scaling Islandora
Scaling IslandoraScaling Islandora
Scaling IslandoraErin Tripp
 
Cassandra 2.1 boot camp, Compaction
Cassandra 2.1 boot camp, CompactionCassandra 2.1 boot camp, Compaction
Cassandra 2.1 boot camp, CompactionJoshua McKenzie
 
Comparing Orchestration
Comparing OrchestrationComparing Orchestration
Comparing OrchestrationKnoldus Inc.
 
LOFAR - finding transients in the radio spectrum
LOFAR - finding transients in the radio spectrumLOFAR - finding transients in the radio spectrum
LOFAR - finding transients in the radio spectrumGijs Molenaar
 
Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseEric Evans
 
Time Series Data with Apache Cassandra
Time Series Data with Apache CassandraTime Series Data with Apache Cassandra
Time Series Data with Apache CassandraEric Evans
 
High Performance Scaling Techniques in Golang Using Go Assembly
High Performance Scaling Techniques in Golang Using Go AssemblyHigh Performance Scaling Techniques in Golang Using Go Assembly
High Performance Scaling Techniques in Golang Using Go AssemblyMinio
 

What's hot (20)

Etcd terraform by Alex Somesan
Etcd terraform by Alex SomesanEtcd terraform by Alex Somesan
Etcd terraform by Alex Somesan
 
Container Listing Update (Liberty Swift Design Summit)
Container Listing Update (Liberty Swift Design Summit)Container Listing Update (Liberty Swift Design Summit)
Container Listing Update (Liberty Swift Design Summit)
 
Rook: Storage for Containers in Containers – data://disrupted® 2020
Rook: Storage for Containers in Containers  – data://disrupted® 2020Rook: Storage for Containers in Containers  – data://disrupted® 2020
Rook: Storage for Containers in Containers – data://disrupted® 2020
 
Arbiter volumes in gluster
Arbiter volumes in glusterArbiter volumes in gluster
Arbiter volumes in gluster
 
Monitoring your shiny new docker environment
Monitoring your shiny new docker environmentMonitoring your shiny new docker environment
Monitoring your shiny new docker environment
 
Cassandra 2.1 boot camp, exercise
Cassandra 2.1 boot camp, exerciseCassandra 2.1 boot camp, exercise
Cassandra 2.1 boot camp, exercise
 
OpenShift.io on Gluster
OpenShift.io on GlusterOpenShift.io on Gluster
OpenShift.io on Gluster
 
Seastar @ NYCC++UG
Seastar @ NYCC++UGSeastar @ NYCC++UG
Seastar @ NYCC++UG
 
Leveraging AWS
Leveraging AWSLeveraging AWS
Leveraging AWS
 
Apache Spark - Aram Mkrtchyan
Apache Spark - Aram MkrtchyanApache Spark - Aram Mkrtchyan
Apache Spark - Aram Mkrtchyan
 
Questions
QuestionsQuestions
Questions
 
Cloud Firestore – From JSON Deserialization to Object Document Mapping (ODM)
Cloud Firestore – From JSON Deserialization to Object Document Mapping (ODM)Cloud Firestore – From JSON Deserialization to Object Document Mapping (ODM)
Cloud Firestore – From JSON Deserialization to Object Document Mapping (ODM)
 
Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-case
 
Scaling Islandora
Scaling IslandoraScaling Islandora
Scaling Islandora
 
Cassandra 2.1 boot camp, Compaction
Cassandra 2.1 boot camp, CompactionCassandra 2.1 boot camp, Compaction
Cassandra 2.1 boot camp, Compaction
 
Comparing Orchestration
Comparing OrchestrationComparing Orchestration
Comparing Orchestration
 
LOFAR - finding transients in the radio spectrum
LOFAR - finding transients in the radio spectrumLOFAR - finding transients in the radio spectrum
LOFAR - finding transients in the radio spectrum
 
Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-case
 
Time Series Data with Apache Cassandra
Time Series Data with Apache CassandraTime Series Data with Apache Cassandra
Time Series Data with Apache Cassandra
 
High Performance Scaling Techniques in Golang Using Go Assembly
High Performance Scaling Techniques in Golang Using Go AssemblyHigh Performance Scaling Techniques in Golang Using Go Assembly
High Performance Scaling Techniques in Golang Using Go Assembly
 

Viewers also liked

Rust v1.0 release celebration party
Rust v1.0 release celebration partyRust v1.0 release celebration party
Rust v1.0 release celebration partyAkira Hayakawa
 
dm-writeboost-kernelvm
dm-writeboost-kernelvmdm-writeboost-kernelvm
dm-writeboost-kernelvmAkira Hayakawa
 
アルゴリズム取引のシステムを開発・運用してみて分かったこと
アルゴリズム取引のシステムを開発・運用してみて分かったことアルゴリズム取引のシステムを開発・運用してみて分かったこと
アルゴリズム取引のシステムを開発・運用してみて分かったことSatoshi KOBAYASHI
 
新分野に飛び入って半年で業績を作るには
新分野に飛び入って半年で業績を作るには新分野に飛び入って半年で業績を作るには
新分野に飛び入って半年で業績を作るにはAsai Masataro
 
En Ch03 Figs
En Ch03 FigsEn Ch03 Figs
En Ch03 Figsgourab87
 
10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤ10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤTakashi Hoshino
 
[AWSマイスターシリーズ]Amazon SNSモバイルプッシュ通知
[AWSマイスターシリーズ]Amazon SNSモバイルプッシュ通知[AWSマイスターシリーズ]Amazon SNSモバイルプッシュ通知
[AWSマイスターシリーズ]Amazon SNSモバイルプッシュ通知Amazon Web Services Japan
 
Unity Cloud Buildの使い方
Unity Cloud Buildの使い方Unity Cloud Buildの使い方
Unity Cloud Buildの使い方Makoto Ito
 
Nyandoc: Scaladoc/Javadoc to markdown converter
Nyandoc: Scaladoc/Javadoc to markdown converterNyandoc: Scaladoc/Javadoc to markdown converter
Nyandoc: Scaladoc/Javadoc to markdown convertertod esking
 
Amazon SNS Mobile Push を使ってみる
Amazon SNS Mobile Push を使ってみるAmazon SNS Mobile Push を使ってみる
Amazon SNS Mobile Push を使ってみる崇之 清水
 
PFIセミナー 2013/02/28 「プログラミング言語の今」
PFIセミナー 2013/02/28 「プログラミング言語の今」PFIセミナー 2013/02/28 「プログラミング言語の今」
PFIセミナー 2013/02/28 「プログラミング言語の今」Preferred Networks
 
関数プログラミング入門
関数プログラミング入門関数プログラミング入門
関数プログラミング入門Hideyuki Tanaka
 
Matrix Factorizationを使った評価予測
Matrix Factorizationを使った評価予測Matrix Factorizationを使った評価予測
Matrix Factorizationを使った評価予測JAVA DM
 

Viewers also liked (20)

Raft
RaftRaft
Raft
 
Rust v1.0 release celebration party
Rust v1.0 release celebration partyRust v1.0 release celebration party
Rust v1.0 release celebration party
 
dm-writeboost-kernelvm
dm-writeboost-kernelvmdm-writeboost-kernelvm
dm-writeboost-kernelvm
 
アルゴリズム取引のシステムを開発・運用してみて分かったこと
アルゴリズム取引のシステムを開発・運用してみて分かったことアルゴリズム取引のシステムを開発・運用してみて分かったこと
アルゴリズム取引のシステムを開発・運用してみて分かったこと
 
Chapter16
Chapter16Chapter16
Chapter16
 
dm-thin-internal-ja
dm-thin-internal-jadm-thin-internal-ja
dm-thin-internal-ja
 
新分野に飛び入って半年で業績を作るには
新分野に飛び入って半年で業績を作るには新分野に飛び入って半年で業績を作るには
新分野に飛び入って半年で業績を作るには
 
Chapter23
Chapter23Chapter23
Chapter23
 
En Ch03 Figs
En Ch03 FigsEn Ch03 Figs
En Ch03 Figs
 
10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤ10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤ
 
Chapter17
Chapter17Chapter17
Chapter17
 
[AWSマイスターシリーズ]Amazon SNSモバイルプッシュ通知
[AWSマイスターシリーズ]Amazon SNSモバイルプッシュ通知[AWSマイスターシリーズ]Amazon SNSモバイルプッシュ通知
[AWSマイスターシリーズ]Amazon SNSモバイルプッシュ通知
 
Unity Cloud Buildの使い方
Unity Cloud Buildの使い方Unity Cloud Buildの使い方
Unity Cloud Buildの使い方
 
Nyandoc: Scaladoc/Javadoc to markdown converter
Nyandoc: Scaladoc/Javadoc to markdown converterNyandoc: Scaladoc/Javadoc to markdown converter
Nyandoc: Scaladoc/Javadoc to markdown converter
 
Amazon SNS Mobile Push を使ってみる
Amazon SNS Mobile Push を使ってみるAmazon SNS Mobile Push を使ってみる
Amazon SNS Mobile Push を使ってみる
 
PFIセミナー 2013/02/28 「プログラミング言語の今」
PFIセミナー 2013/02/28 「プログラミング言語の今」PFIセミナー 2013/02/28 「プログラミング言語の今」
PFIセミナー 2013/02/28 「プログラミング言語の今」
 
Chapter24
Chapter24Chapter24
Chapter24
 
関数プログラミング入門
関数プログラミング入門関数プログラミング入門
関数プログラミング入門
 
Chapter19
Chapter19Chapter19
Chapter19
 
Matrix Factorizationを使った評価予測
Matrix Factorizationを使った評価予測Matrix Factorizationを使った評価予測
Matrix Factorizationを使った評価予測
 

Similar to Introduction to Bizur consensus algorithm

Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Junping Du
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
Raft Engine Meetup 220702.pdf
Raft Engine Meetup 220702.pdfRaft Engine Meetup 220702.pdf
Raft Engine Meetup 220702.pdffengxun
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingAlluxio, Inc.
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafkaZach Cox
 
Bookie storage - Apache BookKeeper Meetup - 2015-06-28
Bookie storage - Apache BookKeeper Meetup - 2015-06-28 Bookie storage - Apache BookKeeper Meetup - 2015-06-28
Bookie storage - Apache BookKeeper Meetup - 2015-06-28 Matteo Merli
 
Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Idan Atias
 
Object Compaction in Cloud for High Yield
Object Compaction in Cloud for High YieldObject Compaction in Cloud for High Yield
Object Compaction in Cloud for High YieldScyllaDB
 
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin Omeroglu
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin OmerogluStorage Capacity Management on Multi-tenant Kafka Cluster with Nurettin Omeroglu
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin OmerogluHostedbyConfluent
 
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioSpeed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioAlluxio, Inc.
 
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...Michael Stack
 
Cassandra in production
Cassandra in productionCassandra in production
Cassandra in productionvalstadsve
 
TriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in ProductionTriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in Productiontrihug
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introductionkanedafromparis
 
Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)Marcos García
 
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...Lucidworks
 
PyConIT6 - MAKING SESSIONS AND CACHING ROOMMATES
PyConIT6 - MAKING SESSIONS AND CACHING ROOMMATESPyConIT6 - MAKING SESSIONS AND CACHING ROOMMATES
PyConIT6 - MAKING SESSIONS AND CACHING ROOMMATESAlessandro Molina
 
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...Hisham Mardam-Bey
 

Similar to Introduction to Bizur consensus algorithm (20)

Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Raft Engine Meetup 220702.pdf
Raft Engine Meetup 220702.pdfRaft Engine Meetup 220702.pdf
Raft Engine Meetup 220702.pdf
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Bookie storage - Apache BookKeeper Meetup - 2015-06-28
Bookie storage - Apache BookKeeper Meetup - 2015-06-28 Bookie storage - Apache BookKeeper Meetup - 2015-06-28
Bookie storage - Apache BookKeeper Meetup - 2015-06-28
 
Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)Introduction to Container Storage Interface (CSI)
Introduction to Container Storage Interface (CSI)
 
Object Compaction in Cloud for High Yield
Object Compaction in Cloud for High YieldObject Compaction in Cloud for High Yield
Object Compaction in Cloud for High Yield
 
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin Omeroglu
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin OmerogluStorage Capacity Management on Multi-tenant Kafka Cluster with Nurettin Omeroglu
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin Omeroglu
 
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioSpeed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with Alluxio
 
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
 
Cassandra in production
Cassandra in productionCassandra in production
Cassandra in production
 
TriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in ProductionTriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in Production
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 
Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)
 
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
 
PyConIT6 - MAKING SESSIONS AND CACHING ROOMMATES
PyConIT6 - MAKING SESSIONS AND CACHING ROOMMATESPyConIT6 - MAKING SESSIONS AND CACHING ROOMMATES
PyConIT6 - MAKING SESSIONS AND CACHING ROOMMATES
 
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
 

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Introduction to Bizur consensus algorithm

  • 1. Introduction to Bizur Akira Hayakawa (Github: akiradeveloper)
  • 2. What’s Bizur? ● A consensus algorithm Elastifile [1] invented to develop their distributed file system ● Aim to solve problems regarding Paxos-like log-based algorithms ● As of 14 Feb. 2017 it’s only preprint available in Arxiv [2] ● Blog posts are also available [3, 4]
  • 3. The drawback of Paxos-like algorithms ● a) Reading an object requires having all related preceding entries available ● b) A slow operation will affect unrelated succeeding operations ● c) Time to detect a failure is longer ● d) Once a failure is detected, recovery can take a long time ● e) It requires handling the complex flow of log-compaction
  • 4. Bizur against Paxos-like algorithms ● Paxos-like algorithms tries to solve a more general problem. Thus produces the drawbacks a) - e) ● Bizur, on the other hand, is optimized for a specific use-case: a strongly-consistent distributed key-value store ● With Paxos-like algorithms, it can be thought of as if each key has its own distributed log consensus algorithm. However, that would be very inefficient
  • 5. Data structures ● Bucket: where k-v pairs are packed ○ hash function :: k -> bucket_index ○ Buckets are all independent but operations on the same bucket are serialized (e.g. protected by mutex) ○ Version = (elect_id, counter). Similar to Raft’s (term, index) pair ○ Each bucket can have different leadership. For example, the leader of bucket 0 can be Node A and the leader of bucket 1 can be Node B ■ elect_id = the term that leader claims ■ voted_elect_id = the term each node knows ○ Leader election and client interaction are the same as Raft ● Node = [Bucket]
  • 6. API ● READ, WRITE are majority-aware operations ● Operations decode, encode_xxx are to read/write a local bucket ● READ is required preceding to updates (set or delete)
  • 7. WRITE ● Update the bucket’s version ● Succeed only if acked by majority ● Leader ○ Claim the bucket’s elect_id is its elect_id ● Replica ○ Nack if the claimed elect_id is older than the voted_elect_id it knows ○ Otherwise replace the local bucket and ack
  • 8. READ ● Return the local bucket if majority admit the bucket is the latest ● Processing ENSURERECOVERY in prior can keep the bucket up-to-date
  • 9. ENSURERECOVERY ● When the local bucket is older than the elect_id (current term) it should repair the local bucket ● Processed in case of leader change ● Use REPLICAREAD and take the most up-to-date replica of the bucket ○ Reading from majority guarantees that at least one of the replicas are up-to-date
  • 10. Extension: Shard ● Intermediate data structure between Node and Bucket ○ Node = N * Shard, Shard = M * Bucket ○ N: static (e.g. 256) ○ M: dynamic (e.g. 100k) ● Introduced as an optimization ○ Theoretically, we can maintain leadership on bucket basis but that’s too expensive ○ Instead, we can lift it up to shard level so the buckets in shard can share that: elect_id and voted_elect_id can be maintain in the shard level and buckets only refer to it
  • 11. Configuration change ● Based on SMART [5] ● Configuration ○ The cluster membership (X -> Y) ○ M (#buckets in a shard) ● High level description ○ 1) Create a new Bizur instance running on membership Y ○ 2) Notifies the old instance to return RECONFIGERROR to all requests ○ 3) On the error, client can access to the new instance ○ 4) New instance migrates buckets from the old instance Bizur Bizur
  • 12. Bucket migration on changing M ● k-v pairs increase and the buckets become huge. It’s time to Increase M ● The easiest way is to increase it in power of two. That way, we can maintain the bucket boundaries ● It’s something like caching where the old instance is backing storeold new node shard bucket
  • 13. Links ● [1] Elastifile ● [2] Bizur: A Key-Value Consensus Algorithm for Scalable File-systems ● [3] Bizur: A New Key-value Consensus Algorithm ● [4] Log-less Consensus ● [5] The SMART way to migrate replicated stateful services