Is there anybody out there? Scala Days Berlin 2018

Is there anybody out there?
Scala Days Berlin 2018
Manuel Bernhardt

Disclaimer
[A career in distributed systems] is both exhilara6ng and
frustra6ng. When things work, it's like a symphony. When they
don't, it's like an eleventh-birthday party where half of the kids are
on speed.
— Jeﬀ Darcy, HekaFS (formerly CloudFS) project lead
@elmanu - h+ps://manuel.bernhardt.io

manuel.bernhardt.io
• Guiding companies to get started with
reac4ve systems and to keep them
running
• Lightbend consul4ng and training
partner, focus on Akka (Cluster,
Streams)
• Scuba-diver, currently stranded in
Austria

Mo#va#onal quote
Life is a single player game. You’re born alone. You’re going to die
alone. All of your interpreta8ons are alone. All your memories are
alone. You’re gone in three genera8ons and no one cares. Before
you showed up nobody cared. It’s all single player.
— Naval Ravikant

Key issues for building clusters
• discovery: who's there?
• fault detec0on: who's in trouble?
• load balancing: who can take up work?

Key issues for building clusters
• discovery: who's there?
• fault detec0on: who's in trouble?
• load balancing: who can take up work?
Group membership

Implemen'ng group membership
1. Failure Detec.on
2. Dissemina.on
3. Consensus

Failure detec,on
Wish you were here (1975)

Failure detector key
proper0es
• Completeness: crash-failure of any
group member is detected by all non-
faulty members

proper0es
• Completeness: crash-failure of any
group member is detected by all non-
faulty members
• Accuracy: no non-faulty group member
is declared as failed by any other non-
faulty group member (no false posi9ves)

proper0es
• Completeness: crash-failure of any group
member is detected by all non-faulty
members
• Accuracy: no non-faulty group member is
declared as failed by any other non-faulty
group member (no false posi9ves)
Also relevant in prac/ce:
• speed
• network message load

Impossibility result
It is impossible for a failure detector algorithm to determinis3cally
achieve both completeness and accuracy over an asynchronous
unreliable network1
1
Chandra, Toueg: Unreliable failure detectors for reliable distributed systems (1996)

Trade-oﬀs
• Strong - Weak completeness: all / some non-faulty members
detect a crash
• Strong - Weak accuracy: there are no / some false-posi7ves

Trade-oﬀs
• Strong - Weak completeness: all / some non-faulty members
detect a crash
• Strong - Weak accuracy: there are no / some false-posi7ves
In prac(ce most applica(ons prefer strong completeness with a
weaker form of accuracy

Failure Detector strategies
• heartbeat: no heartbeat = failure
• ping: no response = failure

Phi Adap)ve Accrual
Failure Detector 3
• has a cool name
• adap.ve: adjusts to network condi.ons
• introduces the no.on of accrual failure
detec.on: suspicion value φ rather than
boolean (trusted or suspected)
• made popular by Cassandra
3
N. Hayashibara et al: The ϕ Accrual Failure Detector (2004)

Phi Adap)ve Accrual
Failure Detector 3
Example: master and worker processes
• φ(w) > 8 stop sending new work to
the node
• φ(w) > 10 start to rebalance current
tasks of the worker to other nodes
• φ(w) > 12 remove the worker from
the list of nodes
3
N. Hayashibara et al: The ϕ Accrual Failure Detector (2004)

New Adap)ve accrual
Failure Detector 4
• much simpler to calculate suspicion
level than Phi
• performs slightly be7er and more
adap)ve 5
5
h$ps://manuel.bernhardt.io/2017/07/26/a-new-adap=ve-accrual-
failure-detector-for-akka/
4
B. Satzger et al: A new adap3ve accrual failure detector for dependable
distributed systems (2007)

SWIM Failure Detector
As you swim lazily through the milieu,
The secrets of the world will infect you 6
• has both a dissemina.on and a failure detec.on component
• scalable membership protocol
• members are ﬁrst suspected and not immediately ﬂagged as
failed
6
A. Das et al: SWIM: scalable weakly-consistent infec:on-style process group membership protocol (2002)

Lifeguard Failure Detector
• based on SWIM, developed by
Hashicorp 7
• memberlist implementa;on 8
• extensions to the SWIM protocol
• dras;cally reduces the amount of false-
posi;ves
8
h$ps://github.com/hashicorp/memberlist
7
A. Dadgar et al: Lifeguard: SWIM-ing with Situa:onal Awareness (2017)

Dissemina(on
How to communicate changes in the cluster?
• members joining
• members leaving
• members failing

Dissemina(on strategies: mul(cast

Dissemina(on strategies:
mul(cast
• hardware / IP / UDP mul1cast: not
readily (or willingly) enabled in data
centres
• even if we had mul1cast support we'd
s1ll have quite a bit of work to do 23
23
X. Défago, A. Schiper, P. Urbán: Total Order Broadcast and MulDcast
Algorithms: Taxonomy and Survey

Dissemina(on strategies:
gossip protocols
• based on the research done in the P2P
days 10 11 24 25
25
P. Rama, A. D. George, M. Radlinski, R. SubramaniyanL GEMS: Gosssip-
Enabled Monitoring Service for Heterogeneous Distributed Systems
24
I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, H. Balakrishnan: Chord: A
Scalable Peer-to-peer Lookup Service for Internet ApplicaHons
11
S. Ranganathan et al: Gossip-Style Failure Detec:on and Distributed
Consensus for Scalable Heterogeneous Clusters (2000)
10
van Renesse et al: A gossip-style failure detec9on service (1998)

Gossip styles
• gossip to one node at random 10
10

Gossip styles
• round-robin, binary round-robin, round-
robin with sequence check 11
11
10

Gossip styles
• round-robin, binary round-robin,
round-robin with sequence check 11
11
10

Gossip styles
• round-robin, binary round-robin, round-
robin with sequence check 11
• piggy-back on another protocol (e.g. on
a failure detector 6
): also called
infec&on-style gossip
6
A. Das et al: SWIM: scalable weakly-consistent infec:on-style process
group membership protocol (2002)
11
10

What do you even gossip about

Example of gossip op.miza.ons
• Akka Cluster: gossip with a higher probability to nodes that have
not already seen a gossip
• Akka Cluster: speeds up gossip (3x) when less than half of the
members have seen the latest gossip
• Lifeguard: an<-entropy mechanism based on nodes doing a full
sync with a node at random (helps to speed up convergence a?er
a network par<<on)

ConsensusA momentary lapse of reason (1987)

Impossibility result - group
membership
Group membership with a single group is impossible when there
are nodes that are suspected of having failed 9
9
Chandra et al: On the Impossibility of Group Membership (1996)

It would be unwise to make
membership-related decisions while
there are processes suspected of
having crashed

Reaching consensus: .me
• Lamport Clocks 12
: how do you order events in a distributed
system
12
L. Lamport: Time, Clocks, and the Ordering of Events in a Distributed System (1978)

• Lamport Clocks: how do you order events in a distributed
system
• Vector Clocks 13
: how do you order events in a distributed
system and ﬂag concurrent event
13
F. Ma(ern: Virtual Time and Global States of Distributed Systems (1989)

• Lamport Clocks: how do you order events in a distributed system
• Vector Clocks: how do you order events in a distributed system and flag
concurrent events
• Version Vectors 14 15
and Do3ed Version Vectors 16
: very similar but
seman;cs primarily concerned with versioning and conflict detec;on in
replicas
16
N. Preguiça: Do1ed Version Vectors: Efficient Causality Tracking for Distributed Key-Value Stores (2012)
15
h%ps://haslab.wordpress.com/2011/07/08/version-vectors-are-not-vector-clocks/
14
D.S. Parker: Detec/on of mutual inconsistency in distributed systems (1983)

private[cluster] final case class Gossip(
members: immutable.SortedSet[Member],
overview: GossipOverview = GossipOverview(),
version: VectorClock = VectorClock(),
tombstones: Map[UniqueAddress, Gossip.Timestamp] = Map.empty)

Reaching consensus:
replicated state machines
Any suﬃsciently complicated model class
contains an ad-hoc, informally-speciﬁed,
bug-ridden, slow implementa;on of half a
state machine
— Pete Forde

Reaching consensus:
replicated state machines
This method allows one to implement any
desired form of mul4process
synchroniza4on in a distributed system
— Leslie Lamport 12
12
L. Lamport: Time, Clocks, and the Ordering of Events in a Distributed
System (1978)

Reaching consensus:
protocols
• fault-tolerant distributed systems
• how do mul1ple servers agree on a
value?
• Paxos 19 20
, Ra@ 21
, CASPaxos 22
22
D. Rystsov: CASPaxos: Replicated State Machines without logs (2018)
21
D. Ongaro and J. Ousterhout: In Search of an Understandable
Consensus Algorithm (Extended Version) (2014)
20
L. Lamport: Paxos made simple (2001)
19
L. Lamport: The Part-Time Parliament (1998)

Reaching consensus:
CRDTs
• Conﬂict-Free Replicated Data types 17
• strong eventual consistency
• two families:
• CmRDTs (commuta+vity of
opera+ons)
• CvRDTs (convergence of state - merge
funcBon)
17
F.B. Schneider: Implemen4ng Fault Tolerant Services Using the State
Machine Approach: A Tutorial (1990)

Reaching consensus: conven/ons
• reaching a decision by not transmi2ng any informa4on
• example: Akka Cluster leader designa4on
/**
* INTERNAL API
* Orders the members by their address except that members with status
* Joining, Exiting and Down are ordered last (in that order).
*/
private[cluster] val leaderStatusOrdering: Ordering[Member] = ...

In prac(ce: Akka Cluster

Akka Cluster
• failure detector: φ Adap)ve Accrual FD
with ping-pong
• dissemina0on: random biased gossip
• consensus:
• leader by conven)on
• membership decisions driven by
leader (joining, leaving)

Akka Cluster: membership states

Akka Cluster: happy path

Akka Cluster: sad path (aka reality)

Akka Cluster: how do you even

Failure detectors
• Chandra, Toueg: Unreliable failure detectors for reliable distributed systems (1996)
• N. Hayashibara et al: The ϕ Accrual Failure Detector (2004)
• B. Satzger et al: A new adapNve accrual failure detector for dependable distributed systems
(2007)
• hQps://manuel.bernhardt.io/2017/07/26/a-new-adapNve-accrual-failure-detector-for-akka/
• A. Das et al: SWIM: scalable weakly-consistent infecNon-style process group membership
protocol (2002)
• A. Dadgar et al: Lifeguard: SWIM-ing with SituaNonal Awareness (2017)
• hQps://github.com/hashicorp/memberlist

Impossibility results
• M.J. Fischer, N.A. Lynch, and M.S. Paterson: Impossibility of
distributed consensus with one faulty process (1985)
• Chandra, Toueg: Unreliable failure detectors for reliable
distributed systems (1996)
• Chandra et al: On the Impossibility of Group Membership (1996)

Dissemina(on
• van Renesse et al: A gossip-style failure detec8on service (1998)
• S. Ranganathan et al: Gossip-Style Failure Detec8on and Distributed Consensus
for Scalable Heterogeneous Clusters (2000)
• P. Rama, A. D. George, M. Radlinski, R. SubramaniyanL GEMS: Gosssip-Enabled
Monitoring Service for Heterogeneous Distributed Systems (2002)
• X. Défago, A. Schiper, P. Urbán: Total Order Broadcast and Mul8cast Algorithms:
Taxonomy and Survey
• I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, H. Balakrishnan: Chord: A Scalable
Peer-to-peer Lookup Service for Internet Applica8ons

Consensus - )me
• L. Lamport: Time, Clocks, and the Ordering of Events in a Distributed
System (1978)
• F. MaJern: Virtual Time and Global States of Distributed Systems (1989)
• D.S. Parker: DetecNon of mutual inconsistency in distributed systems (1983)
• hJps://haslab.wordpress.com/2011/07/08/version-vectors-are-not-vector-
clocks/
• N. Preguiça: DoJed Version Vectors: Eﬃcient Causality Tracking for
Distributed Key-Value Stores (2012)

Consensus - protocols
• F.B. Schneider: Implemen3ng Fault Tolerant Services Using the State
Machine Approach: A Tutorial (1990)
• L. Lamport: The Part-Time Parliament (1998)
• L. Lamport: Paxos made simple (2001)
• D. Ongaro and J. Ousterhout: In Search of an Understandable
Consensus Algorithm (Extended Version) (2014)
• D. Rystsov: CASPaxos: Replicated State Machines without logs (2018)

Consensus - misc
• M. Shapiro: Conﬂict-free Replicated Data Types (2011)

Thank you!
• Ques&ons?
• Website: h1ps://manuel.bernhardt.io
• Twi1er: @elmanu

Is there anybody out there? Scala Days Berlin 2018

Recommended

Recommended

More Related Content

Similar to Is there anybody out there? Scala Days Berlin 2018

Similar to Is there anybody out there? Scala Days Berlin 2018 (20)

More from Manuel Bernhardt

More from Manuel Bernhardt (16)

Recently uploaded

Recently uploaded (20)

Is there anybody out there? Scala Days Berlin 2018