Manging scalability of distributed system

How to scale a distributed (file) system
Atin Mukherjee
Gluster Hacker
SSE @ Red Hat
@mukherjee_atin
IRC : atinmu

Agenda
● Consensus in distributed system
● CAP theorem in distributed system
● Different distributed system design approaches
● Design challenges
● RAFT algorithm
● Consistent distributed store
● etcd
● Q & A

Consensus in distributed system
● Consensus – An agrement but for what and
between whom?
● For what → the op/transaction can be
committed or not
● Between whom → Answer is pretty simple, the
nodes forming the distributed system
● Quorum – (n/2) + 1

CAP theorem
● Any two of the following three gurantees
– Consistency (all nodes see the same data at the
same time)
– Availability (a guarantee that every request
receives a response about whether it succeeded or
failed)
– Partition tolerance (the system continues to operate
despite arbitrary message loss or failure of part of
the system)

Distributed system design approaches
● No meta data – all nodes share across their
data
● Meta data server – One node holds data where
others fetches from it
So which one is better???
Probably none of them? Ask yourself for a
minute....

Challenges in design of a distributed
system
● No meta data server
– N * N exchange of Network messages
– Not scalable when N is probably in hundreds or
thousands
– Initialization time can be very high
– Can end up in a situation like “whom to believe,
whom not to” - popularly known as split brain
– How to undo a transaction locally

Challenges in design of a distributed
system - 2
● MDS (Meta data server)
– SPOF
Ahh!! so is this the only drawback??
– How about having replicas and then replica count??
– Additional N/W hop, lower performance

RAFT – A consensus algorithm
● Key functions
– Asymmetric – leader based
– Leader election
– Normal operation
– Safety and consistency after leader changes
– Neutralizing old leaders
– Client interactions
– Configuration changes

RAFT : Terms
● Divided into two parts
– Election
– Normal operation
● At most 1 leader per term
● Failed election - split vote
● Each server maintains current term value
● Identify obsolete information

RAFT : Server states
● Server states transition

RAFT : Replicated state machine
● A picture says thousand words...

RAFT : Different RPCs
● RequestVote RPCs – Candidate sends to other
nodes for electing itself as leader
● AppendEntries RPCs – Normal operation
workload
● AppendEntries RPCs with no message - Heart
beat messages – Leader sends to all followers
to make its presence

RAFT : Leader Election
● current_term++
● Follower->Candidate
● Self vote
● Send request vote RPCs to all other servers, retry until either:
– Receive votes from majority of server
– Receive RPC from valid leader
– Election time out elapses – increment term
● Election properties
– Safety – allow at most one winner per term
– Liveness – some candidate must eventually win

Consistent distributed store
● A common consistent store which can be
shared by different nodes
● In the form of key value pair for ease of use
● Such distributed key value store
implementations are available.

etcd
● Named as /etc distributed
● Open source distributed consistent key value store
● Based on RAFT
● Highly available and reliable
● Sequentially consistent
● Watchable
● Exposed via HTTP
● Runtime reconfigurable (Saling feature)
● Durable (snapshot backup/restore)
● Time to live keys (have a time out)

Why etcd
● Vibrant community
● 500+ applications like kubernetes, cloud
foundry using it
● 150+ developers
● Stable releases

Conclusion
● Use etcd sub cluster to store configuration data
● No burden on application to maintain
consistency
● And that's all!!

References
● https://raftconsensus.github.io/
● https://www.youtube.com/watch?
v=YbZ3zDzDnrw
● https://github.com/coreos/etcd#etcd

Manging scalability of distributed system

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Manging scalability of distributed system

Similar to Manging scalability of distributed system (20)

More from Atin Mukherjee

More from Atin Mukherjee (7)

Recently uploaded

Recently uploaded (20)

Manging scalability of distributed system