SlideShare a Scribd company logo
How to scale a distributed (file) system
Atin Mukherjee
Gluster Hacker
SSE @ Red Hat
@mukherjee_atin
IRC : atinmu
Agenda
● Consensus in distributed system
● CAP theorem in distributed system
● Different distributed system design approaches
● Design challenges
● RAFT algorithm
● Consistent distributed store
● etcd
● Q & A
Consensus in distributed system
● Consensus – An agrement but for what and
between whom?
● For what → the op/transaction can be
committed or not
● Between whom → Answer is pretty simple, the
nodes forming the distributed system
● Quorum – (n/2) + 1
CAP theorem
● Any two of the following three gurantees
– Consistency (all nodes see the same data at the
same time)
– Availability (a guarantee that every request
receives a response about whether it succeeded or
failed)
– Partition tolerance (the system continues to operate
despite arbitrary message loss or failure of part of
the system)
Distributed system design approaches
● No meta data – all nodes share across their
data
● Meta data server – One node holds data where
others fetches from it
So which one is better???
Probably none of them? Ask yourself for a
minute....
Challenges in design of a distributed
system
● No meta data server
– N * N exchange of Network messages
– Not scalable when N is probably in hundreds or
thousands
– Initialization time can be very high
– Can end up in a situation like “whom to believe,
whom not to” - popularly known as split brain
– How to undo a transaction locally
Challenges in design of a distributed
system - 2
● MDS (Meta data server)
– SPOF
Ahh!! so is this the only drawback??
– How about having replicas and then replica count??
– Additional N/W hop, lower performance
RAFT – A consensus algorithm
● Key functions
– Asymmetric – leader based
– Leader election
– Normal operation
– Safety and consistency after leader changes
– Neutralizing old leaders
– Client interactions
– Configuration changes
RAFT : Terms
● Divided into two parts
– Election
– Normal operation
● At most 1 leader per term
● Failed election - split vote
● Each server maintains current term value
● Identify obsolete information
RAFT : Server states
● Server states transition
RAFT : Replicated state machine
● A picture says thousand words...
RAFT : Different RPCs
● RequestVote RPCs – Candidate sends to other
nodes for electing itself as leader
● AppendEntries RPCs – Normal operation
workload
● AppendEntries RPCs with no message - Heart
beat messages – Leader sends to all followers
to make its presence
RAFT : Leader Election
● current_term++
● Follower->Candidate
● Self vote
● Send request vote RPCs to all other servers, retry until either:
– Receive votes from majority of server
– Receive RPC from valid leader
– Election time out elapses – increment term
● Election properties
– Safety – allow at most one winner per term
– Liveness – some candidate must eventually win
Consistent distributed store
● A common consistent store which can be
shared by different nodes
● In the form of key value pair for ease of use
● Such distributed key value store
implementations are available.
etcd
● Named as /etc distributed
● Open source distributed consistent key value store
● Based on RAFT
● Highly available and reliable
● Sequentially consistent
● Watchable
● Exposed via HTTP
● Runtime reconfigurable (Saling feature)
● Durable (snapshot backup/restore)
● Time to live keys (have a time out)
Why etcd
● Vibrant community
● 500+ applications like kubernetes, cloud
foundry using it
● 150+ developers
● Stable releases
Conclusion
● Use etcd sub cluster to store configuration data
● No burden on application to maintain
consistency
● And that's all!!
References
● https://raftconsensus.github.io/
● https://www.youtube.com/watch?
v=YbZ3zDzDnrw
● https://github.com/coreos/etcd#etcd
Q & A
THANK YOU

More Related Content

What's hot

Task migration in os
Task migration in osTask migration in os
Task migration in os
uos lahore pakistan
 
Designing large scale distributed systems
Designing large scale distributed systemsDesigning large scale distributed systems
Designing large scale distributed systems
Ashwani Priyedarshi
 
Agreement Protocols, distributed File Systems, Distributed Shared Memory
Agreement Protocols, distributed File Systems, Distributed Shared MemoryAgreement Protocols, distributed File Systems, Distributed Shared Memory
Agreement Protocols, distributed File Systems, Distributed Shared Memory
SHIKHA GAUTAM
 
Pnuts yahoo!’s hosted data serving platform
Pnuts  yahoo!’s hosted data serving platformPnuts  yahoo!’s hosted data serving platform
Pnuts yahoo!’s hosted data serving platform
lammya aa
 
Os examples scheduling
Os examples schedulingOs examples scheduling
Os examples scheduling
Dana dia
 
Hbase hivepig
Hbase hivepigHbase hivepig
Hbase hivepig
Radha Krishna
 
Management on Cloud 2011
Management on Cloud 2011Management on Cloud 2011
Management on Cloud 2011
steccami
 
Distributed Mutual exclusion algorithms
Distributed Mutual exclusion algorithmsDistributed Mutual exclusion algorithms
Distributed Mutual exclusion algorithms
MNM Jain Engineering College
 
Distributed System
Distributed System Distributed System
Distributed System
Nitesh Saitwal
 
Performance Engineering Requirements
Performance Engineering RequirementsPerformance Engineering Requirements
Performance Engineering Requirements
srivinayak
 
Database replication
Database replicationDatabase replication
Database replication
Arslan111
 
Making Automation Work
Making Automation WorkMaking Automation Work
Making Automation Work
strikr .
 
OS Process and Thread Concepts
OS Process and Thread ConceptsOS Process and Thread Concepts
OS Process and Thread Concepts
sgpraju
 
Presentation on Transaction
Presentation on TransactionPresentation on Transaction
Presentation on Transaction
Rahul Prajapati
 
Communication And Synchronization In Distributed Systems
Communication And Synchronization In Distributed SystemsCommunication And Synchronization In Distributed Systems
Communication And Synchronization In Distributed Systems
guest61205606
 
Transaction concurrency control
Transaction concurrency controlTransaction concurrency control
Transaction concurrency control
Anand Grewal
 
Chapter 18 - Distributed Coordination
Chapter 18 - Distributed CoordinationChapter 18 - Distributed Coordination
Chapter 18 - Distributed Coordination
Wayne Jones Jnr
 
Transactions and Concurrency Control
Transactions and Concurrency ControlTransactions and Concurrency Control
Transactions and Concurrency Control
Dilum Bandara
 
Process Scheduling
Process SchedulingProcess Scheduling
Process Scheduling
Santhi thi
 
dos mutual exclusion algos
dos mutual exclusion algosdos mutual exclusion algos
dos mutual exclusion algos
Akhil Sharma
 

What's hot (20)

Task migration in os
Task migration in osTask migration in os
Task migration in os
 
Designing large scale distributed systems
Designing large scale distributed systemsDesigning large scale distributed systems
Designing large scale distributed systems
 
Agreement Protocols, distributed File Systems, Distributed Shared Memory
Agreement Protocols, distributed File Systems, Distributed Shared MemoryAgreement Protocols, distributed File Systems, Distributed Shared Memory
Agreement Protocols, distributed File Systems, Distributed Shared Memory
 
Pnuts yahoo!’s hosted data serving platform
Pnuts  yahoo!’s hosted data serving platformPnuts  yahoo!’s hosted data serving platform
Pnuts yahoo!’s hosted data serving platform
 
Os examples scheduling
Os examples schedulingOs examples scheduling
Os examples scheduling
 
Hbase hivepig
Hbase hivepigHbase hivepig
Hbase hivepig
 
Management on Cloud 2011
Management on Cloud 2011Management on Cloud 2011
Management on Cloud 2011
 
Distributed Mutual exclusion algorithms
Distributed Mutual exclusion algorithmsDistributed Mutual exclusion algorithms
Distributed Mutual exclusion algorithms
 
Distributed System
Distributed System Distributed System
Distributed System
 
Performance Engineering Requirements
Performance Engineering RequirementsPerformance Engineering Requirements
Performance Engineering Requirements
 
Database replication
Database replicationDatabase replication
Database replication
 
Making Automation Work
Making Automation WorkMaking Automation Work
Making Automation Work
 
OS Process and Thread Concepts
OS Process and Thread ConceptsOS Process and Thread Concepts
OS Process and Thread Concepts
 
Presentation on Transaction
Presentation on TransactionPresentation on Transaction
Presentation on Transaction
 
Communication And Synchronization In Distributed Systems
Communication And Synchronization In Distributed SystemsCommunication And Synchronization In Distributed Systems
Communication And Synchronization In Distributed Systems
 
Transaction concurrency control
Transaction concurrency controlTransaction concurrency control
Transaction concurrency control
 
Chapter 18 - Distributed Coordination
Chapter 18 - Distributed CoordinationChapter 18 - Distributed Coordination
Chapter 18 - Distributed Coordination
 
Transactions and Concurrency Control
Transactions and Concurrency ControlTransactions and Concurrency Control
Transactions and Concurrency Control
 
Process Scheduling
Process SchedulingProcess Scheduling
Process Scheduling
 
dos mutual exclusion algos
dos mutual exclusion algosdos mutual exclusion algos
dos mutual exclusion algos
 

Similar to Manging scalability of distributed system

Consensus algo with_distributed_key_value_store_in_distributed_system
Consensus algo with_distributed_key_value_store_in_distributed_systemConsensus algo with_distributed_key_value_store_in_distributed_system
Consensus algo with_distributed_key_value_store_in_distributed_system
Atin Mukherjee
 
Storing the real world data
Storing the real world dataStoring the real world data
Storing the real world data
Athira Mukundan
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemaker
kuchinskaya
 
Coordination in distributed systems
Coordination in distributed systemsCoordination in distributed systems
Coordination in distributed systems
Andrea Monacchi
 
Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
Ankita Kapratwar
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
Kathirvel Ayyaswamy
 
Distributed fun with etcd
Distributed fun with etcdDistributed fun with etcd
Distributed fun with etcd
Abdulaziz AlMalki
 
Spark 1.0
Spark 1.0Spark 1.0
Spark 1.0
Jatin Arora
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
Sakari Keskitalo
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
Sakari Keskitalo
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
Codership Oy - Creators of Galera Cluster
 
SVCC-2014
SVCC-2014SVCC-2014
SVCC-2014
John Brinnand
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data grid
Bogdan Dina
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
Alexander Penev
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
LinkedIn
 
M|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for YouM|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for You
MariaDB plc
 
20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database serv...
20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database serv...20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database serv...
20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database serv...
Umair Shahid
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors ppt
Siddhartha Anand
 
OSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithOSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles Judith
NETWAYS
 
Megastore: Providing scalable and highly available storage
Megastore: Providing scalable and highly available storageMegastore: Providing scalable and highly available storage
Megastore: Providing scalable and highly available storage
Niels Claeys
 

Similar to Manging scalability of distributed system (20)

Consensus algo with_distributed_key_value_store_in_distributed_system
Consensus algo with_distributed_key_value_store_in_distributed_systemConsensus algo with_distributed_key_value_store_in_distributed_system
Consensus algo with_distributed_key_value_store_in_distributed_system
 
Storing the real world data
Storing the real world dataStoring the real world data
Storing the real world data
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemaker
 
Coordination in distributed systems
Coordination in distributed systemsCoordination in distributed systems
Coordination in distributed systems
 
Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
Distributed fun with etcd
Distributed fun with etcdDistributed fun with etcd
Distributed fun with etcd
 
Spark 1.0
Spark 1.0Spark 1.0
Spark 1.0
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
 
SVCC-2014
SVCC-2014SVCC-2014
SVCC-2014
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data grid
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
M|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for YouM|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for You
 
20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database serv...
20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database serv...20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database serv...
20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database serv...
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors ppt
 
OSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithOSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles Judith
 
Megastore: Providing scalable and highly available storage
Megastore: Providing scalable and highly available storageMegastore: Providing scalable and highly available storage
Megastore: Providing scalable and highly available storage
 

More from Atin Mukherjee

GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized StoreGlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
Atin Mukherjee
 
Ready to go
Ready to goReady to go
Ready to go
Atin Mukherjee
 
Glusterd_thread_synchronization_using_urcu_lca2016
Glusterd_thread_synchronization_using_urcu_lca2016Glusterd_thread_synchronization_using_urcu_lca2016
Glusterd_thread_synchronization_using_urcu_lca2016
Atin Mukherjee
 
Gluster d2.0
Gluster d2.0Gluster d2.0
Gluster d2.0
Atin Mukherjee
 
Thread synchronization in GlusterD using URCU
Thread synchronization in GlusterD using URCUThread synchronization in GlusterD using URCU
Thread synchronization in GlusterD using URCU
Atin Mukherjee
 
GlusterD - Daemon refactoring
GlusterD - Daemon refactoringGlusterD - Daemon refactoring
GlusterD - Daemon refactoring
Atin Mukherjee
 
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Atin Mukherjee
 

More from Atin Mukherjee (7)

GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized StoreGlusterD 2.0 - Managing Distributed File System Using a Centralized Store
GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
 
Ready to go
Ready to goReady to go
Ready to go
 
Glusterd_thread_synchronization_using_urcu_lca2016
Glusterd_thread_synchronization_using_urcu_lca2016Glusterd_thread_synchronization_using_urcu_lca2016
Glusterd_thread_synchronization_using_urcu_lca2016
 
Gluster d2.0
Gluster d2.0Gluster d2.0
Gluster d2.0
 
Thread synchronization in GlusterD using URCU
Thread synchronization in GlusterD using URCUThread synchronization in GlusterD using URCU
Thread synchronization in GlusterD using URCU
 
GlusterD - Daemon refactoring
GlusterD - Daemon refactoringGlusterD - Daemon refactoring
GlusterD - Daemon refactoring
 
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
 

Recently uploaded

Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 

Recently uploaded (20)

Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 

Manging scalability of distributed system

  • 1. How to scale a distributed (file) system Atin Mukherjee Gluster Hacker SSE @ Red Hat @mukherjee_atin IRC : atinmu
  • 2. Agenda ● Consensus in distributed system ● CAP theorem in distributed system ● Different distributed system design approaches ● Design challenges ● RAFT algorithm ● Consistent distributed store ● etcd ● Q & A
  • 3. Consensus in distributed system ● Consensus – An agrement but for what and between whom? ● For what → the op/transaction can be committed or not ● Between whom → Answer is pretty simple, the nodes forming the distributed system ● Quorum – (n/2) + 1
  • 4. CAP theorem ● Any two of the following three gurantees – Consistency (all nodes see the same data at the same time) – Availability (a guarantee that every request receives a response about whether it succeeded or failed) – Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
  • 5. Distributed system design approaches ● No meta data – all nodes share across their data ● Meta data server – One node holds data where others fetches from it So which one is better??? Probably none of them? Ask yourself for a minute....
  • 6. Challenges in design of a distributed system ● No meta data server – N * N exchange of Network messages – Not scalable when N is probably in hundreds or thousands – Initialization time can be very high – Can end up in a situation like “whom to believe, whom not to” - popularly known as split brain – How to undo a transaction locally
  • 7. Challenges in design of a distributed system - 2 ● MDS (Meta data server) – SPOF Ahh!! so is this the only drawback?? – How about having replicas and then replica count?? – Additional N/W hop, lower performance
  • 8. RAFT – A consensus algorithm ● Key functions – Asymmetric – leader based – Leader election – Normal operation – Safety and consistency after leader changes – Neutralizing old leaders – Client interactions – Configuration changes
  • 9. RAFT : Terms ● Divided into two parts – Election – Normal operation ● At most 1 leader per term ● Failed election - split vote ● Each server maintains current term value ● Identify obsolete information
  • 10. RAFT : Server states ● Server states transition
  • 11. RAFT : Replicated state machine ● A picture says thousand words...
  • 12. RAFT : Different RPCs ● RequestVote RPCs – Candidate sends to other nodes for electing itself as leader ● AppendEntries RPCs – Normal operation workload ● AppendEntries RPCs with no message - Heart beat messages – Leader sends to all followers to make its presence
  • 13. RAFT : Leader Election ● current_term++ ● Follower->Candidate ● Self vote ● Send request vote RPCs to all other servers, retry until either: – Receive votes from majority of server – Receive RPC from valid leader – Election time out elapses – increment term ● Election properties – Safety – allow at most one winner per term – Liveness – some candidate must eventually win
  • 14. Consistent distributed store ● A common consistent store which can be shared by different nodes ● In the form of key value pair for ease of use ● Such distributed key value store implementations are available.
  • 15. etcd ● Named as /etc distributed ● Open source distributed consistent key value store ● Based on RAFT ● Highly available and reliable ● Sequentially consistent ● Watchable ● Exposed via HTTP ● Runtime reconfigurable (Saling feature) ● Durable (snapshot backup/restore) ● Time to live keys (have a time out)
  • 16. Why etcd ● Vibrant community ● 500+ applications like kubernetes, cloud foundry using it ● 150+ developers ● Stable releases
  • 17. Conclusion ● Use etcd sub cluster to store configuration data ● No burden on application to maintain consistency ● And that's all!!
  • 19. Q & A