SlideShare a Scribd company logo
1 of 26
Using consensus algorithm and
distributed store in designing
distributed system
Atin Mukherjee
GlusterFS Hacker
@mukherjee_atin
Topics
● What is consensus in distributed system?
● What is CAP theorem in distributed system
● Different distributed system design approaches
● Challenges in design of a distributed system
● What is RAFT algorithm and how it works
● Distributed store
● Combining RAFT & distributed store – in the form of
technologies like consul/etcd/zookeeper etc
● Q & A
What is consensus in distributed
system
● Consensus – An agrement but for what and
between whom?
● For what → the op/transaction to be committed
or not
● Between whom → Answer is pretty simple, the
nodes forming the distributed system
● Quorum – (n/2) + 1
CAP theorem
● Any two of the following three gurantees
– Consistency (all nodes see the same data at the
same time)
– Availability (a guarantee that every request
receives a response about whether it succeeded or
failed)
– Partition tolerance (the system continues to
operate despite arbitrary message loss or failure of
part of the system)
Design approaches of distributed
system
● No meta data – all nodes share across their
data
● Meta data server – One node holds data where
others fetches from it
So which one is better???
Probably none of them? Ask yourself for a
minute....
Challenges in design of a distributed
system
● No meta data
– N * N exchange of Network messages
– Not scalable when N is probably in hundreds or
thousands
– Initialization time can be very high
– Can end up in a situation like “whom to believe,
whom not to” - popularly known as split brain
– How to undo a transaction locally
Challenges in design of a distributed
system contd...
● MDS (Meta data server)
– SPOF
Ahh!! so is this the only drawback??
– How about having replicas and then replica count??
– Additional N/W hop, lower performance
RAFT – A consensus algorithm
● Key features
– Leader followers based model
– Leader election
– Normal operation
– Safety and consistency after leader changes
– Neutralizing old leaders
– Client interactions
– Configuration changes
RAFT : Server states
● Server states transition
RAFT : Terms
● Divided into two parts
– Election
– Normal operation
● At most 1 leader per term
● Failed election
● Split vote
● Each server maintains current term value
RAFT : Replicated state machine
● A picture says thousand words...
RAFT : Different RPCs
● RequestVote RPCs – Candidate sends to other
nodes for electing itself as leader
● AppendEntries RPCs – Normal operation
workload
● AppendEntries RPCs with no message - Heart
beat messages – Leader sends to all followers
to make its presence
RAFT : Leader Election
● current_term++
● Follower->Candidate
● Self vote
● Send request vote RPCs to all other servers, retry until either:
– Receive votes from majority of server
– Receive RPC from valid leader
– Election time out elapses – increment term
● Election properties
– Safety – allow at most one winner per term
– Liveness – some candidate must eventually win
RAFT : Picking the best leader
● Candidate include log info in RequestVote
RPCs with index & term of last log entry
● Voting server V denies vote if its log is more
complete by
(votingServerLastTerm > candidateLastTerm ||
((votingServerLastTerm == candidateLastTerm) &&
(votingServerLastIndex > candidateLastIndex))
● But is this enough to have crash consistency?
RAFT : New commitment rules
● For a leader to decide an entry is committed:
– Must be stored on a majority of server &
– At least one new entry from leader's term must also
be stored on majority of servers
RAFT : Log inconsistency
● Leader repairs log entries by
– Delete extraneous entries
– Fill in missing entries from the leader
RAFT : Neutralizing old leaders
● Sender sends its term over RPC
● If sender's term in older than receiver's term
RPC is rejected else it receiver steps down to
follower, updates its term and process the RPC
RAFT : Client protocol
● Send commands to leader
– If leader is unknown, send to anyone
– If contacted server is not leader, it will redirect to leader
● Client gets back the response after the full cycle at leader
● Req- timeout
– Re-issues command to other server
– Unique id for each command at client to avoid duplicate
execution
Joint consensus phase
● 2 phase approach
● Need majority of both old and new
configurations for election and commitment
● Configuration change is just a log entry, applied
immediately on receipt (committed or not)
● Once joint consensus is committed, begin
replicate log entry for final configuration
Distributed store
● A common store which can be shared by
different nodes
● In the form of key value pair for ease of use
● Such distributed key value store
implementations are available.
etcd
● Named as /etc distributed
● Open source distributed consistent key value store
● Highly available and reliable
● Sequentially consistent
● Watchable
● Exposed via HTTP
● Runtime reconfigurable (Saling feature)
● Durable (snapshot backup/restore)
● Time to live keys (have a time out)
etcd cond..
● Bootstraping using RAFT
● Proxy mode in node
● Cluster configuration – etcdctl member
add/remove/list
● Similar projects like consul, zookeeper are also
available.
Why etcd
● Vibrant community
● 500+ applications like kubernetes, cloud
foundry using it
● 150+ developers
● Stable releases
References
● https://raftconsensus.github.io/
● https://www.youtube.com/watch?
v=YbZ3zDzDnrw
● https://github.com/coreos/etcd#etcd
● https://consul.io/
Q & A
THANK YOU

More Related Content

What's hot

Concurrency bug identification through kernel panic log (english)
Concurrency bug identification through kernel panic log (english)Concurrency bug identification through kernel panic log (english)
Concurrency bug identification through kernel panic log (english)Sneeker Yeh
 
OpenZFS send and receive
OpenZFS send and receiveOpenZFS send and receive
OpenZFS send and receiveMatthew Ahrens
 
Kernel Recipes 2016 - Landlock LSM: Unprivileged sandboxing
Kernel Recipes 2016 - Landlock LSM: Unprivileged sandboxingKernel Recipes 2016 - Landlock LSM: Unprivileged sandboxing
Kernel Recipes 2016 - Landlock LSM: Unprivileged sandboxingAnne Nicolas
 
An Introduction to the Formalised Memory Model for Linux Kernel
An Introduction to the Formalised Memory Model for Linux KernelAn Introduction to the Formalised Memory Model for Linux Kernel
An Introduction to the Formalised Memory Model for Linux KernelSeongJae Park
 
RTAI - Earliest Deadline First
RTAI - Earliest Deadline FirstRTAI - Earliest Deadline First
RTAI - Earliest Deadline FirstStefano Bragaglia
 
LCA14: LCA14-412: GPGPU on ARM SoC session
LCA14: LCA14-412: GPGPU on ARM SoC sessionLCA14: LCA14-412: GPGPU on ARM SoC session
LCA14: LCA14-412: GPGPU on ARM SoC sessionLinaro
 
Sweetening Systems Management with Salt
Sweetening Systems Management with SaltSweetening Systems Management with Salt
Sweetening Systems Management with Saltmchesnut
 
The Silence of the Canaries
The Silence of the CanariesThe Silence of the Canaries
The Silence of the CanariesKernel TLV
 
Lac2006 Lee Revell Slides
Lac2006 Lee Revell SlidesLac2006 Lee Revell Slides
Lac2006 Lee Revell Slidesrlrevell
 
Large scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutionsLarge scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutionsHan Zhou
 
Process synchronization in Operating Systems
Process synchronization in Operating SystemsProcess synchronization in Operating Systems
Process synchronization in Operating SystemsRitu Ranjan Shrivastwa
 
Libckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixLibckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixZongYing Lyu
 
Is That A Penguin In My Windows?
Is That A Penguin In My Windows?Is That A Penguin In My Windows?
Is That A Penguin In My Windows?zeroSteiner
 
Kernel Recipes 2016 - New hwmon device registration API - Jean Delvare
Kernel Recipes 2016 -  New hwmon device registration API - Jean DelvareKernel Recipes 2016 -  New hwmon device registration API - Jean Delvare
Kernel Recipes 2016 - New hwmon device registration API - Jean DelvareAnne Nicolas
 
Real Time Operating System Concepts
Real Time Operating System ConceptsReal Time Operating System Concepts
Real Time Operating System ConceptsSanjiv Malik
 
Continuous Performance Regression Testing with JfrUnit
Continuous Performance Regression Testing with JfrUnitContinuous Performance Regression Testing with JfrUnit
Continuous Performance Regression Testing with JfrUnitScyllaDB
 

What's hot (20)

Concurrency bug identification through kernel panic log (english)
Concurrency bug identification through kernel panic log (english)Concurrency bug identification through kernel panic log (english)
Concurrency bug identification through kernel panic log (english)
 
OpenZFS send and receive
OpenZFS send and receiveOpenZFS send and receive
OpenZFS send and receive
 
Rtai
RtaiRtai
Rtai
 
Kernel Recipes 2016 - Landlock LSM: Unprivileged sandboxing
Kernel Recipes 2016 - Landlock LSM: Unprivileged sandboxingKernel Recipes 2016 - Landlock LSM: Unprivileged sandboxing
Kernel Recipes 2016 - Landlock LSM: Unprivileged sandboxing
 
An Introduction to the Formalised Memory Model for Linux Kernel
An Introduction to the Formalised Memory Model for Linux KernelAn Introduction to the Formalised Memory Model for Linux Kernel
An Introduction to the Formalised Memory Model for Linux Kernel
 
RTAI - Earliest Deadline First
RTAI - Earliest Deadline FirstRTAI - Earliest Deadline First
RTAI - Earliest Deadline First
 
LCA14: LCA14-412: GPGPU on ARM SoC session
LCA14: LCA14-412: GPGPU on ARM SoC sessionLCA14: LCA14-412: GPGPU on ARM SoC session
LCA14: LCA14-412: GPGPU on ARM SoC session
 
Sweetening Systems Management with Salt
Sweetening Systems Management with SaltSweetening Systems Management with Salt
Sweetening Systems Management with Salt
 
Sgnog openflow demo-v1.0
Sgnog openflow demo-v1.0Sgnog openflow demo-v1.0
Sgnog openflow demo-v1.0
 
Transactional Memory
Transactional MemoryTransactional Memory
Transactional Memory
 
The Silence of the Canaries
The Silence of the CanariesThe Silence of the Canaries
The Silence of the Canaries
 
Free FreeRTOS Course-Task Management
Free FreeRTOS Course-Task ManagementFree FreeRTOS Course-Task Management
Free FreeRTOS Course-Task Management
 
Lac2006 Lee Revell Slides
Lac2006 Lee Revell SlidesLac2006 Lee Revell Slides
Lac2006 Lee Revell Slides
 
Large scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutionsLarge scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutions
 
Process synchronization in Operating Systems
Process synchronization in Operating SystemsProcess synchronization in Operating Systems
Process synchronization in Operating Systems
 
Libckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixLibckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unix
 
Is That A Penguin In My Windows?
Is That A Penguin In My Windows?Is That A Penguin In My Windows?
Is That A Penguin In My Windows?
 
Kernel Recipes 2016 - New hwmon device registration API - Jean Delvare
Kernel Recipes 2016 -  New hwmon device registration API - Jean DelvareKernel Recipes 2016 -  New hwmon device registration API - Jean Delvare
Kernel Recipes 2016 - New hwmon device registration API - Jean Delvare
 
Real Time Operating System Concepts
Real Time Operating System ConceptsReal Time Operating System Concepts
Real Time Operating System Concepts
 
Continuous Performance Regression Testing with JfrUnit
Continuous Performance Regression Testing with JfrUnitContinuous Performance Regression Testing with JfrUnit
Continuous Performance Regression Testing with JfrUnit
 

Viewers also liked

Distributed Consensus: Making Impossible Possible [Revised]
Distributed Consensus: Making Impossible Possible [Revised]Distributed Consensus: Making Impossible Possible [Revised]
Distributed Consensus: Making Impossible Possible [Revised]Heidi Howard
 
Replication and Synchronization Algorithms for Distributed Databases - Lena W...
Replication and Synchronization Algorithms for Distributed Databases - Lena W...Replication and Synchronization Algorithms for Distributed Databases - Lena W...
Replication and Synchronization Algorithms for Distributed Databases - Lena W...distributed matters
 
Distributed Consensus A.K.A. "What do we eat for lunch?"
Distributed Consensus A.K.A. "What do we eat for lunch?"Distributed Consensus A.K.A. "What do we eat for lunch?"
Distributed Consensus A.K.A. "What do we eat for lunch?"Konrad Malawski
 
We don't need consensus: All agreed?
We don't need consensus: All agreed?We don't need consensus: All agreed?
We don't need consensus: All agreed?Weaveworks
 
Spark Stream and SEEP
Spark Stream and SEEPSpark Stream and SEEP
Spark Stream and SEEPAmir Payberah
 
Linux Module Programming
Linux Module ProgrammingLinux Module Programming
Linux Module ProgrammingAmir Payberah
 
MegaStore and Spanner
MegaStore and SpannerMegaStore and Spanner
MegaStore and SpannerAmir Payberah
 
Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2Amir Payberah
 
Process Management - Part2
Process Management - Part2Process Management - Part2
Process Management - Part2Amir Payberah
 
File System Implementation - Part2
File System Implementation - Part2File System Implementation - Part2
File System Implementation - Part2Amir Payberah
 
CPU Scheduling - Part2
CPU Scheduling - Part2CPU Scheduling - Part2
CPU Scheduling - Part2Amir Payberah
 
The Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics PlatformThe Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics PlatformAmir Payberah
 

Viewers also liked (20)

Etcd terraform by Alex Somesan
Etcd terraform by Alex SomesanEtcd terraform by Alex Somesan
Etcd terraform by Alex Somesan
 
Distributed Consensus: Making Impossible Possible [Revised]
Distributed Consensus: Making Impossible Possible [Revised]Distributed Consensus: Making Impossible Possible [Revised]
Distributed Consensus: Making Impossible Possible [Revised]
 
Replication and Synchronization Algorithms for Distributed Databases - Lena W...
Replication and Synchronization Algorithms for Distributed Databases - Lena W...Replication and Synchronization Algorithms for Distributed Databases - Lena W...
Replication and Synchronization Algorithms for Distributed Databases - Lena W...
 
Distributed Consensus A.K.A. "What do we eat for lunch?"
Distributed Consensus A.K.A. "What do we eat for lunch?"Distributed Consensus A.K.A. "What do we eat for lunch?"
Distributed Consensus A.K.A. "What do we eat for lunch?"
 
We don't need consensus: All agreed?
We don't need consensus: All agreed?We don't need consensus: All agreed?
We don't need consensus: All agreed?
 
Spark Stream and SEEP
Spark Stream and SEEPSpark Stream and SEEP
Spark Stream and SEEP
 
MapReduce
MapReduceMapReduce
MapReduce
 
Linux Module Programming
Linux Module ProgrammingLinux Module Programming
Linux Module Programming
 
MegaStore and Spanner
MegaStore and SpannerMegaStore and Spanner
MegaStore and Spanner
 
Main Memory - Part2
Main Memory - Part2Main Memory - Part2
Main Memory - Part2
 
Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2
 
Process Management - Part2
Process Management - Part2Process Management - Part2
Process Management - Part2
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Protection
ProtectionProtection
Protection
 
IO Systems
IO SystemsIO Systems
IO Systems
 
Security
SecuritySecurity
Security
 
File System Implementation - Part2
File System Implementation - Part2File System Implementation - Part2
File System Implementation - Part2
 
CPU Scheduling - Part2
CPU Scheduling - Part2CPU Scheduling - Part2
CPU Scheduling - Part2
 
Storage
StorageStorage
Storage
 
The Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics PlatformThe Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics Platform
 

Similar to Consensus algo with_distributed_key_value_store_in_distributed_system

Manging scalability of distributed system
Manging scalability of distributed systemManging scalability of distributed system
Manging scalability of distributed systemAtin Mukherjee
 
Coordination in distributed systems
Coordination in distributed systemsCoordination in distributed systems
Coordination in distributed systemsAndrea Monacchi
 
Storing the real world data
Storing the real world dataStoring the real world data
Storing the real world dataAthira Mukundan
 
Choosing the right high availability strategy
Choosing the right high availability strategyChoosing the right high availability strategy
Choosing the right high availability strategyMariaDB plc
 
Choosing the right high availability strategy
Choosing the right high availability strategyChoosing the right high availability strategy
Choosing the right high availability strategyMariaDB plc
 
M|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for YouM|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for YouMariaDB plc
 
Unveiling etcd: Architecture and Source Code Deep Dive
Unveiling etcd: Architecture and Source Code Deep DiveUnveiling etcd: Architecture and Source Code Deep Dive
Unveiling etcd: Architecture and Source Code Deep DiveChieh (Jack) Yu
 
Efficient Geographic Replication & Disaster Recovery
Efficient Geographic Replication & Disaster RecoveryEfficient Geographic Replication & Disaster Recovery
Efficient Geographic Replication & Disaster RecoveryOpen Networking Summit
 
Heart of the SwarmKit: Store, Topology & Object Model
Heart of the SwarmKit: Store, Topology & Object ModelHeart of the SwarmKit: Store, Topology & Object Model
Heart of the SwarmKit: Store, Topology & Object ModelDocker, Inc.
 
Traffic Shaping Basics with PRIQ - pfSense Hangout February 2016
Traffic Shaping Basics with PRIQ - pfSense Hangout February 2016Traffic Shaping Basics with PRIQ - pfSense Hangout February 2016
Traffic Shaping Basics with PRIQ - pfSense Hangout February 2016Netgate
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors pptSiddhartha Anand
 
NCM Training - Part 2 - Automation, Notification, Compliance and Reports
NCM Training - Part 2 - Automation, Notification, Compliance and ReportsNCM Training - Part 2 - Automation, Notification, Compliance and Reports
NCM Training - Part 2 - Automation, Notification, Compliance and ReportsManageEngine, Zoho Corporation
 
Our journey into scalable player engagement platform
Our journey into scalable player engagement platformOur journey into scalable player engagement platform
Our journey into scalable player engagement platformIdan Fridman
 
Synchronization
SynchronizationSynchronization
SynchronizationSara shall
 

Similar to Consensus algo with_distributed_key_value_store_in_distributed_system (20)

Manging scalability of distributed system
Manging scalability of distributed systemManging scalability of distributed system
Manging scalability of distributed system
 
Coordination in distributed systems
Coordination in distributed systemsCoordination in distributed systems
Coordination in distributed systems
 
Raft presentation
Raft presentationRaft presentation
Raft presentation
 
Storing the real world data
Storing the real world dataStoring the real world data
Storing the real world data
 
Choosing the right high availability strategy
Choosing the right high availability strategyChoosing the right high availability strategy
Choosing the right high availability strategy
 
Choosing the right high availability strategy
Choosing the right high availability strategyChoosing the right high availability strategy
Choosing the right high availability strategy
 
M|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for YouM|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for You
 
Unveiling etcd: Architecture and Source Code Deep Dive
Unveiling etcd: Architecture and Source Code Deep DiveUnveiling etcd: Architecture and Source Code Deep Dive
Unveiling etcd: Architecture and Source Code Deep Dive
 
SVCC-2014
SVCC-2014SVCC-2014
SVCC-2014
 
Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
 
Raft in details
Raft in detailsRaft in details
Raft in details
 
Efficient Geographic Replication & Disaster Recovery
Efficient Geographic Replication & Disaster RecoveryEfficient Geographic Replication & Disaster Recovery
Efficient Geographic Replication & Disaster Recovery
 
Heart of the SwarmKit: Store, Topology & Object Model
Heart of the SwarmKit: Store, Topology & Object ModelHeart of the SwarmKit: Store, Topology & Object Model
Heart of the SwarmKit: Store, Topology & Object Model
 
Traffic Shaping Basics with PRIQ - pfSense Hangout February 2016
Traffic Shaping Basics with PRIQ - pfSense Hangout February 2016Traffic Shaping Basics with PRIQ - pfSense Hangout February 2016
Traffic Shaping Basics with PRIQ - pfSense Hangout February 2016
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors ppt
 
Os prj ppt
Os prj pptOs prj ppt
Os prj ppt
 
NCM Training - Part 2 - Automation, Notification, Compliance and Reports
NCM Training - Part 2 - Automation, Notification, Compliance and ReportsNCM Training - Part 2 - Automation, Notification, Compliance and Reports
NCM Training - Part 2 - Automation, Notification, Compliance and Reports
 
Our journey into scalable player engagement platform
Our journey into scalable player engagement platformOur journey into scalable player engagement platform
Our journey into scalable player engagement platform
 
Distributed fun with etcd
Distributed fun with etcdDistributed fun with etcd
Distributed fun with etcd
 
Synchronization
SynchronizationSynchronization
Synchronization
 

More from Atin Mukherjee

GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized StoreGlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized StoreAtin Mukherjee
 
Glusterd_thread_synchronization_using_urcu_lca2016
Glusterd_thread_synchronization_using_urcu_lca2016Glusterd_thread_synchronization_using_urcu_lca2016
Glusterd_thread_synchronization_using_urcu_lca2016Atin Mukherjee
 
Thread synchronization in GlusterD using URCU
Thread synchronization in GlusterD using URCUThread synchronization in GlusterD using URCU
Thread synchronization in GlusterD using URCUAtin Mukherjee
 
GlusterD - Daemon refactoring
GlusterD - Daemon refactoringGlusterD - Daemon refactoring
GlusterD - Daemon refactoringAtin Mukherjee
 
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015Atin Mukherjee
 

More from Atin Mukherjee (7)

GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized StoreGlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
 
Ready to go
Ready to goReady to go
Ready to go
 
Glusterd_thread_synchronization_using_urcu_lca2016
Glusterd_thread_synchronization_using_urcu_lca2016Glusterd_thread_synchronization_using_urcu_lca2016
Glusterd_thread_synchronization_using_urcu_lca2016
 
Gluster d2.0
Gluster d2.0Gluster d2.0
Gluster d2.0
 
Thread synchronization in GlusterD using URCU
Thread synchronization in GlusterD using URCUThread synchronization in GlusterD using URCU
Thread synchronization in GlusterD using URCU
 
GlusterD - Daemon refactoring
GlusterD - Daemon refactoringGlusterD - Daemon refactoring
GlusterD - Daemon refactoring
 
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
 

Recently uploaded

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Consensus algo with_distributed_key_value_store_in_distributed_system

  • 1. Using consensus algorithm and distributed store in designing distributed system Atin Mukherjee GlusterFS Hacker @mukherjee_atin
  • 2. Topics ● What is consensus in distributed system? ● What is CAP theorem in distributed system ● Different distributed system design approaches ● Challenges in design of a distributed system ● What is RAFT algorithm and how it works ● Distributed store ● Combining RAFT & distributed store – in the form of technologies like consul/etcd/zookeeper etc ● Q & A
  • 3. What is consensus in distributed system ● Consensus – An agrement but for what and between whom? ● For what → the op/transaction to be committed or not ● Between whom → Answer is pretty simple, the nodes forming the distributed system ● Quorum – (n/2) + 1
  • 4. CAP theorem ● Any two of the following three gurantees – Consistency (all nodes see the same data at the same time) – Availability (a guarantee that every request receives a response about whether it succeeded or failed) – Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
  • 5. Design approaches of distributed system ● No meta data – all nodes share across their data ● Meta data server – One node holds data where others fetches from it So which one is better??? Probably none of them? Ask yourself for a minute....
  • 6. Challenges in design of a distributed system ● No meta data – N * N exchange of Network messages – Not scalable when N is probably in hundreds or thousands – Initialization time can be very high – Can end up in a situation like “whom to believe, whom not to” - popularly known as split brain – How to undo a transaction locally
  • 7. Challenges in design of a distributed system contd... ● MDS (Meta data server) – SPOF Ahh!! so is this the only drawback?? – How about having replicas and then replica count?? – Additional N/W hop, lower performance
  • 8. RAFT – A consensus algorithm ● Key features – Leader followers based model – Leader election – Normal operation – Safety and consistency after leader changes – Neutralizing old leaders – Client interactions – Configuration changes
  • 9. RAFT : Server states ● Server states transition
  • 10. RAFT : Terms ● Divided into two parts – Election – Normal operation ● At most 1 leader per term ● Failed election ● Split vote ● Each server maintains current term value
  • 11. RAFT : Replicated state machine ● A picture says thousand words...
  • 12. RAFT : Different RPCs ● RequestVote RPCs – Candidate sends to other nodes for electing itself as leader ● AppendEntries RPCs – Normal operation workload ● AppendEntries RPCs with no message - Heart beat messages – Leader sends to all followers to make its presence
  • 13. RAFT : Leader Election ● current_term++ ● Follower->Candidate ● Self vote ● Send request vote RPCs to all other servers, retry until either: – Receive votes from majority of server – Receive RPC from valid leader – Election time out elapses – increment term ● Election properties – Safety – allow at most one winner per term – Liveness – some candidate must eventually win
  • 14. RAFT : Picking the best leader ● Candidate include log info in RequestVote RPCs with index & term of last log entry ● Voting server V denies vote if its log is more complete by (votingServerLastTerm > candidateLastTerm || ((votingServerLastTerm == candidateLastTerm) && (votingServerLastIndex > candidateLastIndex)) ● But is this enough to have crash consistency?
  • 15. RAFT : New commitment rules ● For a leader to decide an entry is committed: – Must be stored on a majority of server & – At least one new entry from leader's term must also be stored on majority of servers
  • 16. RAFT : Log inconsistency ● Leader repairs log entries by – Delete extraneous entries – Fill in missing entries from the leader
  • 17. RAFT : Neutralizing old leaders ● Sender sends its term over RPC ● If sender's term in older than receiver's term RPC is rejected else it receiver steps down to follower, updates its term and process the RPC
  • 18. RAFT : Client protocol ● Send commands to leader – If leader is unknown, send to anyone – If contacted server is not leader, it will redirect to leader ● Client gets back the response after the full cycle at leader ● Req- timeout – Re-issues command to other server – Unique id for each command at client to avoid duplicate execution
  • 19. Joint consensus phase ● 2 phase approach ● Need majority of both old and new configurations for election and commitment ● Configuration change is just a log entry, applied immediately on receipt (committed or not) ● Once joint consensus is committed, begin replicate log entry for final configuration
  • 20. Distributed store ● A common store which can be shared by different nodes ● In the form of key value pair for ease of use ● Such distributed key value store implementations are available.
  • 21. etcd ● Named as /etc distributed ● Open source distributed consistent key value store ● Highly available and reliable ● Sequentially consistent ● Watchable ● Exposed via HTTP ● Runtime reconfigurable (Saling feature) ● Durable (snapshot backup/restore) ● Time to live keys (have a time out)
  • 22. etcd cond.. ● Bootstraping using RAFT ● Proxy mode in node ● Cluster configuration – etcdctl member add/remove/list ● Similar projects like consul, zookeeper are also available.
  • 23. Why etcd ● Vibrant community ● 500+ applications like kubernetes, cloud foundry using it ● 150+ developers ● Stable releases
  • 25. Q & A