SlideShare a Scribd company logo
Is there anybody out there?
Voxxed Days Vienna 2018
Manuel Bernhardt
manuel.bernhardt.io
• Helping companies to get started with
reac4ve systems ...
• ... or to get them to perform
• Lightbend consul4ng and training
partner (Akka, Akka Cluster, Akka
Streams)
@elmanu - h+ps://manuel.bernhardt.io
Mo#va#onal quote
Life is a single player game. You’re born alone. You’re going to die
alone. All of your interpreta8ons are alone. All your memories are
alone. You’re gone in three genera8ons and no one cares. Before
you showed up nobody cared. It’s all single player.
— Naval Ravikant
@elmanu - h+ps://manuel.bernhardt.io
Key issues for building clusters
• discovery: who's there?
• fault detec0on: who's in trouble?
• load balancing: who can take up work?
@elmanu - h+ps://manuel.bernhardt.io
Key issues for building clusters
• discovery: who's there?
• fault detec0on: who's in trouble?
• load balancing: who can take up work?
Group membership
@elmanu - h+ps://manuel.bernhardt.io
Implemen'ng group membership
1. Failure Detec.on
2. Dissemina.on
3. Consensus
@elmanu - h+ps://manuel.bernhardt.io
Failure detec,on
@elmanu - h+ps://manuel.bernhardt.io
Failure detector key
proper0es
• Completeness: crash-failure of any
group member is detected by all non-
faulty members
@elmanu - h+ps://manuel.bernhardt.io
Failure detector key
proper0es
• Completeness: crash-failure of any
group member is detected by all non-
faulty members
• Accuracy: no non-faulty group member
is declared as failed by any other non-
faulty group member (no false posi9ves)
@elmanu - h+ps://manuel.bernhardt.io
Failure detector key
proper0es
• Completeness: crash-failure of any group
member is detected by all non-faulty
members
• Accuracy: no non-faulty group member is
declared as failed by any other non-faulty
group member (no false posi9ves)
Also relevant in prac/ce:
• speed
• network message load
@elmanu - h+ps://manuel.bernhardt.io
Impossibility result
It is impossible for a failure detector algorithm to determinis3cally
achieve both completeness and accuracy over an asynchronous
unreliable network1
1
Chandra, Toueg: Unreliable failure detectors for reliable distributed systems (1996)
@elmanu - h+ps://manuel.bernhardt.io
Trade-offs
• Strong - Weak completeness: all / some non-faulty members
detect a crash
• Strong - Weak accuracy: there are no / some false-posi7ves
@elmanu - h+ps://manuel.bernhardt.io
Trade-offs
• Strong - Weak completeness: all / some non-faulty members
detect a crash
• Strong - Weak accuracy: there are no / some false-posi7ves
In prac(ce most applica(ons prefer strong completeness with a
weaker form of accuracy
@elmanu - h+ps://manuel.bernhardt.io
Failure Detector strategies
• heartbeat: no heartbeat = failure
• ping: no response = failure
@elmanu - h+ps://manuel.bernhardt.io
Phi Adap)ve Accrual
Failure Detector 3
• has a cool name
• adap.ve: adjusts to network condi.ons
• introduces the no.on of accrual failure
detec.on: suspicion value φ rather than
boolean (trusted or suspected)
3
N. Hayashibara et al: The ϕ Accrual Failure Detector (2004)
@elmanu - h+ps://manuel.bernhardt.io
Phi Adap)ve Accrual
Failure Detector 3
Example: master and worker processes
• φ(w) > 8 stop sending new work to
the node
• φ(w) > 10 start to rebalance current
tasks of the worker to other nodes
• φ(w) > 12 remove the worker from
the list of nodes
3
N. Hayashibara et al: The ϕ Accrual Failure Detector (2004)
@elmanu - h+ps://manuel.bernhardt.io
New Adap)ve accrual
Failure Detector 4
• much simpler to calculate suspicion
level than Phi
• performs slightly be7er and more
adap)ve 5
5
h$ps://manuel.bernhardt.io/2017/07/26/a-new-adap=ve-accrual-
failure-detector-for-akka/
4
B. Satzger et al: A new adap3ve accrual failure detector for dependable
distributed systems (2007)
@elmanu - h+ps://manuel.bernhardt.io
SWIM Failure Detector
As you swim lazily through the milieu,
The secrets of the world will infect you 6
• has both a dissemina.on and a failure detec.on component
• scalable membership protocol
• members are first suspected and not immediately flagged as
failed
6
A. Das et al: SWIM: scalable weakly-consistent infec:on-style process group membership protocol (2002)
@elmanu - h+ps://manuel.bernhardt.io
Lifeguard Failure Detector
• based on SWIM, developed by
Hashicorp 7
• memberlist implementa;on 8
• extensions to the SWIM protocol
• dras;cally reduces the amount of false-
posi;ves
8
h$ps://github.com/hashicorp/memberlist
7
A. Dadgar et al: Lifeguard: SWIM-ing with Situa:onal Awareness (2017)
@elmanu - h+ps://manuel.bernhardt.io
Dissemina(on
Dissemina(on
How to communicate changes in the cluster?
• members joining
• members leaving
• members failing
@elmanu - h+ps://manuel.bernhardt.io
Dissemina(on strategies
• hardware / IP / UDP mul1cast: not readily (or willingly) enabled
in data centres
• gossip protocols
@elmanu - h+ps://manuel.bernhardt.io
Gossip styles
• gossip to one node at random 10
10
van Renesse et al: A gossip-style failure detec9on service (1998)
@elmanu - h+ps://manuel.bernhardt.io
Gossip styles
• gossip to one node at random 10
• round-robin, binary round-robin, round-
robin with sequence check 11
11
S. Ranganathan et al: Gossip-Style Failure Detec:on and Distributed
Consensus for Scalable Heterogeneous Clusters (2000)
10
van Renesse et al: A gossip-style failure detec9on service (1998)
@elmanu - h+ps://manuel.bernhardt.io
Gossip styles
• gossip to one node at random 10
• round-robin, binary round-robin,
round-robin with sequence check 11
11
S. Ranganathan et al: Gossip-Style Failure Detec:on and Distributed
Consensus for Scalable Heterogeneous Clusters (2000)
10
van Renesse et al: A gossip-style failure detec9on service (1998)
@elmanu - h+ps://manuel.bernhardt.io
Gossip styles
• gossip to one node at random 10
• round-robin, binary round-robin, round-
robin with sequence check 11
• piggy-back on another protocol (e.g. on
a failure detector 6
): also called
infec&on-style gossip
6
A. Das et al: SWIM: scalable weakly-consistent infec:on-style process
group membership protocol (2002)
11
S. Ranganathan et al: Gossip-Style Failure Detec:on and Distributed
Consensus for Scalable Heterogeneous Clusters (2000)
10
van Renesse et al: A gossip-style failure detec9on service (1998)
@elmanu - h+ps://manuel.bernhardt.io
What do you even gossip about
@elmanu - h+ps://manuel.bernhardt.io
What do you even gossip about
@elmanu - h+ps://manuel.bernhardt.io
What do you even gossip about
@elmanu - h+ps://manuel.bernhardt.io
Example of gossip op.miza.ons
• Akka Cluster: gossip with a higher probability to nodes that have
not already seen a gossip
• Akka Cluster: speeds up gossip (3x) when less than half of the
members have seen the latest gossip
• Lifeguard: an<-entropy mechanism based on nodes doing a full
sync with a node at random (helps to speed up convergence a?er
a network par<<on)
@elmanu - h+ps://manuel.bernhardt.io
Consensus
Impossibility result - group membership
[...] the primary-par//on group membership problem cannot be
solved in asynchronous systems with crash failures, even if one
allows the removal or killing of non-faulty processes that are
erroneously suspected to have crashed 9
9
Chandra et al: On the Impossibility of Group Membership (1996)
@elmanu - h+ps://manuel.bernhardt.io
It would be unwise to make
membership-related decisions while
there are processes suspected of
having crashed
@elmanu - h+ps://manuel.bernhardt.io
Reaching consensus: .me
• Lamport Clocks 12
: how do you order events in a distributed
system
12
L. Lamport: Time, Clocks, and the Ordering of Events in a Distributed System (1978)
@elmanu - h+ps://manuel.bernhardt.io
Reaching consensus: .me
• Lamport Clocks: how do you order events in a distributed
system
• Vector Clocks 13
: how do you order events in a distributed
system and flag concurrent event
13
F. Ma(ern: Virtual Time and Global States of Distributed Systems (1989)
@elmanu - h+ps://manuel.bernhardt.io
Reaching consensus: .me
• Lamport Clocks: how do you order events in a distributed system
• Vector Clocks: how do you order events in a distributed system and flag
concurrent events
• Version Vectors 14 15
: how do you order replicas in a distributed system and
detect conflicts
• Do3ed Version Vectors 16
: same as version vectors, but prevent false conflicts
16
N. Preguiça: Do1ed Version Vectors: Efficient Causality Tracking for Distributed Key-Value Stores (2012)
15
h%ps://haslab.wordpress.com/2011/07/08/version-vectors-are-not-vector-clocks/
14
D.S. Parker: Detec/on of mutual inconsistency in distributed systems (1983)
@elmanu - h+ps://manuel.bernhardt.io
Reaching consensus: .me
private[cluster] final case class Gossip(
members: immutable.SortedSet[Member],
overview: GossipOverview = GossipOverview(),
version: VectorClock = VectorClock(),
tombstones: Map[UniqueAddress, Gossip.Timestamp] = Map.empty)
@elmanu - h+ps://manuel.bernhardt.io
Reaching consensus:
CRDTs
• Conflict-Free Replicated Data types 17
• strong eventual consistency
• two families:
• CmRDTs (commuta+vity of
opera+ons)
• CvRDTs (convergence of state - merge
funcBon)
17
F.B. Schneider: Implemen4ng Fault Tolerant Services Using the State
Machine Approach: A Tutorial (1990)
@elmanu - h+ps://manuel.bernhardt.io
Reaching consensus:
replicated state machines
Any suffisciently complicated model class
contains an ad-hoc, informally-specified,
bug-ridden, slow implementa;on of half a
state machine
— Pete Forde
@elmanu - h+ps://manuel.bernhardt.io
Reaching consensus:
replicated state machines
This method allows one to implement any
desired form of mul4process
synchroniza4on in a distributed system
— Leslie Lamport 12
12
L. Lamport: Time, Clocks, and the Ordering of Events in a Distributed
System (1978)
@elmanu - h+ps://manuel.bernhardt.io
Reaching consensus:
protocols
• fault-tolerant distributed systems
• how do mul1ple servers agree on a
value?
• Paxos 19 20
, Ra@ 21
, CASPaxos 22
22
D. Rystsov: CASPaxos: Replicated State Machines without logs (2018)
21
D. Ongaro and J. Ousterhout: In Search of an Understandable
Consensus Algorithm (Extended Version) (2014)
20
L. Lamport: Paxos made simple (2001)
19
L. Lamport: The Part-Time Parliament (1998)
@elmanu - h+ps://manuel.bernhardt.io
Reaching consensus: conven/ons
• reaching a decision by not transmi2ng any informa4on
• example: Akka Cluster leader designa4on
/**
* INTERNAL API
* Orders the members by their address except that members with status
* Joining, Exiting and Down are ordered last (in that order).
*/
private[cluster] val leaderStatusOrdering: Ordering[Member] = ...
@elmanu - h+ps://manuel.bernhardt.io
In prac(ce: Akka Cluster
@elmanu - h+ps://manuel.bernhardt.io
Akka Cluster
• failure detector: φ Adap)ve Accrual FD
with ping-pong
• dissemina0on: random biased gossip
• consensus:
• leader by conven)on
• membership decisions driven by
leader (joining, leaving)
@elmanu - h+ps://manuel.bernhardt.io
Akka Cluster: membership states
@elmanu - h+ps://manuel.bernhardt.io
Akka Cluster: happy path
@elmanu - h+ps://manuel.bernhardt.io
Akka Cluster: sad path (aka reality)
@elmanu - h+ps://manuel.bernhardt.io
Akka Cluster: how do you even
@elmanu - h+ps://manuel.bernhardt.io
h#ps://manuel.bernhardt.io
Failure detectors
• Chandra, Toueg: Unreliable failure detectors for reliable distributed systems (1996)
• N. Hayashibara et al: The ϕ Accrual Failure Detector (2004)
• B. Satzger et al: A new adapNve accrual failure detector for dependable distributed systems
(2007)
• hQps://manuel.bernhardt.io/2017/07/26/a-new-adapNve-accrual-failure-detector-for-akka/
• A. Das et al: SWIM: scalable weakly-consistent infecNon-style process group membership
protocol (2002)
• A. Dadgar et al: Lifeguard: SWIM-ing with SituaNonal Awareness (2017)
• hQps://github.com/hashicorp/memberlist
@elmanu - h+ps://manuel.bernhardt.io
Impossibility results
• M.J. Fischer, N.A. Lynch, and M.S. Paterson: Impossibility of
distributed consensus with one faulty process (1985)
• Chandra, Toueg: Unreliable failure detectors for reliable
distributed systems (1996)
• Chandra et al: On the Impossibility of Group Membership (1996)
@elmanu - h+ps://manuel.bernhardt.io
Dissemina(on
• van Renesse et al: A gossip-style failure detec8on service (1998)
• S. Ranganathan et al: Gossip-Style Failure Detec8on and
Distributed Consensus for Scalable Heterogeneous Clusters
(2000)
@elmanu - h+ps://manuel.bernhardt.io
Consensus - )me
• L. Lamport: Time, Clocks, and the Ordering of Events in a Distributed
System (1978)
• F. MaJern: Virtual Time and Global States of Distributed Systems (1989)
• D.S. Parker: DetecNon of mutual inconsistency in distributed systems (1983)
• hJps://haslab.wordpress.com/2011/07/08/version-vectors-are-not-vector-
clocks/
• N. Preguiça: DoJed Version Vectors: Efficient Causality Tracking for
Distributed Key-Value Stores (2012)
@elmanu - h+ps://manuel.bernhardt.io
Consensus - protocols
• F.B. Schneider: Implemen3ng Fault Tolerant Services Using the State
Machine Approach: A Tutorial (1990)
• L. Lamport: The Part-Time Parliament (1998)
• L. Lamport: Paxos made simple (2001)
• D. Ongaro and J. Ousterhout: In Search of an Understandable
Consensus Algorithm (Extended Version) (2014)
• D. Rystsov: CASPaxos: Replicated State Machines without logs (2018)
@elmanu - h+ps://manuel.bernhardt.io
Consensus - misc
• M. Shapiro: Conflict-free Replicated Data Types (2011)
@elmanu - h+ps://manuel.bernhardt.io
Thank you!
• Ques&ons?
• Website: h1ps://manuel.bernhardt.io
• Twi1er: @elmanu
@elmanu - h+ps://manuel.bernhardt.io

More Related Content

Similar to Is there anybody out there?

Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
Tao Xie
 
Predictive Analytics with Numenta Machine Intelligence
Predictive Analytics with Numenta Machine IntelligencePredictive Analytics with Numenta Machine Intelligence
Predictive Analytics with Numenta Machine Intelligence
Numenta
 
Ha cluster with openSUSE Leap
Ha cluster with openSUSE LeapHa cluster with openSUSE Leap
Ha cluster with openSUSE Leap
medwinz
 
Reproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesReproducibility: 10 Simple Rules
Reproducibility: 10 Simple Rules
Annika Eriksson
 
The Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed DatabaseThe Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed Database
ArangoDB Database
 
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICES
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICESKEY CONCEPTS FOR SCALABLE STATEFUL SERVICES
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICES
Mykola Novik
 
SRECon Coherent Performance
SRECon Coherent PerformanceSRECon Coherent Performance
SRECon Coherent Performance
Theo Schlossnagle
 
DataHub
DataHubDataHub
Anon p2p slides
Anon p2p slidesAnon p2p slides
Anon p2p slides
chintaan
 
Debugging distributed systems
Debugging distributed systemsDebugging distributed systems
Debugging distributed systems
Bert Jan Schrijver
 
Performance analysis and troubleshooting using DTrace
Performance analysis and troubleshooting using DTracePerformance analysis and troubleshooting using DTrace
Performance analysis and troubleshooting using DTrace
Graeme Jenkinson
 
WTF is Penetration Testing v.2
WTF is Penetration Testing v.2WTF is Penetration Testing v.2
WTF is Penetration Testing v.2
Scott Sutherland
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
Hannes Hapke
 
Say No to the Dependency Hell
Say No to the Dependency HellSay No to the Dependency Hell
Say No to the Dependency Hell
Ivan Pashchenko
 
NAT64/DNS64 experiments, warnings and one useful tool
NAT64/DNS64 experiments, warnings and one useful toolNAT64/DNS64 experiments, warnings and one useful tool
NAT64/DNS64 experiments, warnings and one useful tool
APNIC
 
My flings with data analysis
My flings with data analysisMy flings with data analysis
My flings with data analysis
Venkatesh Prasad Ranganath
 
Analisis Estatico y de Comportamiento de un Binario Malicioso
Analisis Estatico y de Comportamiento de un Binario MaliciosoAnalisis Estatico y de Comportamiento de un Binario Malicioso
Analisis Estatico y de Comportamiento de un Binario MaliciosoConferencias FIST
 
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Nikolay Savvinov
 

Similar to Is there anybody out there? (20)

Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
Predictive Analytics with Numenta Machine Intelligence
Predictive Analytics with Numenta Machine IntelligencePredictive Analytics with Numenta Machine Intelligence
Predictive Analytics with Numenta Machine Intelligence
 
Ha cluster with openSUSE Leap
Ha cluster with openSUSE LeapHa cluster with openSUSE Leap
Ha cluster with openSUSE Leap
 
Reproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesReproducibility: 10 Simple Rules
Reproducibility: 10 Simple Rules
 
The Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed DatabaseThe Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed Database
 
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICES
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICESKEY CONCEPTS FOR SCALABLE STATEFUL SERVICES
KEY CONCEPTS FOR SCALABLE STATEFUL SERVICES
 
SRECon Coherent Performance
SRECon Coherent PerformanceSRECon Coherent Performance
SRECon Coherent Performance
 
DataHub
DataHubDataHub
DataHub
 
Anon p2p slides
Anon p2p slidesAnon p2p slides
Anon p2p slides
 
Debugging distributed systems
Debugging distributed systemsDebugging distributed systems
Debugging distributed systems
 
Performance analysis and troubleshooting using DTrace
Performance analysis and troubleshooting using DTracePerformance analysis and troubleshooting using DTrace
Performance analysis and troubleshooting using DTrace
 
WTF is Penetration Testing v.2
WTF is Penetration Testing v.2WTF is Penetration Testing v.2
WTF is Penetration Testing v.2
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
 
Say No to the Dependency Hell
Say No to the Dependency HellSay No to the Dependency Hell
Say No to the Dependency Hell
 
NAT64/DNS64 experiments, warnings and one useful tool
NAT64/DNS64 experiments, warnings and one useful toolNAT64/DNS64 experiments, warnings and one useful tool
NAT64/DNS64 experiments, warnings and one useful tool
 
My flings with data analysis
My flings with data analysisMy flings with data analysis
My flings with data analysis
 
Analisis Estatico y de Comportamiento de un Binario Malicioso
Analisis Estatico y de Comportamiento de un Binario MaliciosoAnalisis Estatico y de Comportamiento de un Binario Malicioso
Analisis Estatico y de Comportamiento de un Binario Malicioso
 
test
testtest
test
 
HeartBeat
HeartBeatHeartBeat
HeartBeat
 
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...
 

More from Manuel Bernhardt

8 akka anti-patterns you'd better be aware of - Reactive Summit Austin 2017
8 akka anti-patterns you'd better be aware of - Reactive Summit Austin 20178 akka anti-patterns you'd better be aware of - Reactive Summit Austin 2017
8 akka anti-patterns you'd better be aware of - Reactive Summit Austin 2017
Manuel Bernhardt
 
Scala Days Copenhagen - 8 Akka anti-patterns you'd better be aware of
Scala Days Copenhagen - 8 Akka anti-patterns you'd better be aware ofScala Days Copenhagen - 8 Akka anti-patterns you'd better be aware of
Scala Days Copenhagen - 8 Akka anti-patterns you'd better be aware of
Manuel Bernhardt
 
8 Akka anti-patterns you'd better be aware of
8 Akka anti-patterns you'd better be aware of8 Akka anti-patterns you'd better be aware of
8 Akka anti-patterns you'd better be aware of
Manuel Bernhardt
 
Beyond the buzzword: a reactive web-appliction in practice
Beyond the buzzword: a reactive web-appliction in practiceBeyond the buzzword: a reactive web-appliction in practice
Beyond the buzzword: a reactive web-appliction in practice
Manuel Bernhardt
 
Beyond the Buzzword - a reactive application in practice
Beyond the Buzzword - a reactive application in practiceBeyond the Buzzword - a reactive application in practice
Beyond the Buzzword - a reactive application in practice
Manuel Bernhardt
 
Six years of Scala and counting
Six years of Scala and countingSix years of Scala and counting
Six years of Scala and counting
Manuel Bernhardt
 
3 things you must know to think reactive - Geecon Kraków 2015
3 things you must know to think reactive - Geecon Kraków 20153 things you must know to think reactive - Geecon Kraków 2015
3 things you must know to think reactive - Geecon Kraków 2015
Manuel Bernhardt
 
Reactive Web-Applications @ LambdaDays
Reactive Web-Applications @ LambdaDaysReactive Web-Applications @ LambdaDays
Reactive Web-Applications @ LambdaDays
Manuel Bernhardt
 
Writing a technical book
Writing a technical bookWriting a technical book
Writing a technical book
Manuel Bernhardt
 
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVMVoxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Manuel Bernhardt
 
Back to the futures, actors and pipes: using Akka for large-scale data migration
Back to the futures, actors and pipes: using Akka for large-scale data migrationBack to the futures, actors and pipes: using Akka for large-scale data migration
Back to the futures, actors and pipes: using Akka for large-scale data migration
Manuel Bernhardt
 
Project Phoenix - From PHP to the Play Framework in 3 months
Project Phoenix - From PHP to the Play Framework in 3 monthsProject Phoenix - From PHP to the Play Framework in 3 months
Project Phoenix - From PHP to the Play Framework in 3 months
Manuel Bernhardt
 
Scala - Java2Days Sofia
Scala - Java2Days SofiaScala - Java2Days Sofia
Scala - Java2Days Sofia
Manuel Bernhardt
 
Tips and tricks for setting up a Play 2 project
Tips and tricks for setting up a Play 2 projectTips and tricks for setting up a Play 2 project
Tips and tricks for setting up a Play 2 project
Manuel Bernhardt
 
Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
Manuel Bernhardt
 

More from Manuel Bernhardt (16)

8 akka anti-patterns you'd better be aware of - Reactive Summit Austin 2017
8 akka anti-patterns you'd better be aware of - Reactive Summit Austin 20178 akka anti-patterns you'd better be aware of - Reactive Summit Austin 2017
8 akka anti-patterns you'd better be aware of - Reactive Summit Austin 2017
 
Scala Days Copenhagen - 8 Akka anti-patterns you'd better be aware of
Scala Days Copenhagen - 8 Akka anti-patterns you'd better be aware ofScala Days Copenhagen - 8 Akka anti-patterns you'd better be aware of
Scala Days Copenhagen - 8 Akka anti-patterns you'd better be aware of
 
8 Akka anti-patterns you'd better be aware of
8 Akka anti-patterns you'd better be aware of8 Akka anti-patterns you'd better be aware of
8 Akka anti-patterns you'd better be aware of
 
Beyond the buzzword: a reactive web-appliction in practice
Beyond the buzzword: a reactive web-appliction in practiceBeyond the buzzword: a reactive web-appliction in practice
Beyond the buzzword: a reactive web-appliction in practice
 
Beyond the Buzzword - a reactive application in practice
Beyond the Buzzword - a reactive application in practiceBeyond the Buzzword - a reactive application in practice
Beyond the Buzzword - a reactive application in practice
 
Six years of Scala and counting
Six years of Scala and countingSix years of Scala and counting
Six years of Scala and counting
 
3 things you must know to think reactive - Geecon Kraków 2015
3 things you must know to think reactive - Geecon Kraków 20153 things you must know to think reactive - Geecon Kraków 2015
3 things you must know to think reactive - Geecon Kraków 2015
 
Reactive Web-Applications @ LambdaDays
Reactive Web-Applications @ LambdaDaysReactive Web-Applications @ LambdaDays
Reactive Web-Applications @ LambdaDays
 
Writing a technical book
Writing a technical bookWriting a technical book
Writing a technical book
 
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVMVoxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
 
Back to the futures, actors and pipes: using Akka for large-scale data migration
Back to the futures, actors and pipes: using Akka for large-scale data migrationBack to the futures, actors and pipes: using Akka for large-scale data migration
Back to the futures, actors and pipes: using Akka for large-scale data migration
 
Project Phoenix - From PHP to the Play Framework in 3 months
Project Phoenix - From PHP to the Play Framework in 3 monthsProject Phoenix - From PHP to the Play Framework in 3 months
Project Phoenix - From PHP to the Play Framework in 3 months
 
Scala - Java2Days Sofia
Scala - Java2Days SofiaScala - Java2Days Sofia
Scala - Java2Days Sofia
 
Tips and tricks for setting up a Play 2 project
Tips and tricks for setting up a Play 2 projectTips and tricks for setting up a Play 2 project
Tips and tricks for setting up a Play 2 project
 
Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
 
Scala pitfalls
Scala pitfallsScala pitfalls
Scala pitfalls
 

Recently uploaded

Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Game Development with Unity3D (Game Development lecture 3)
Game Development  with Unity3D (Game Development lecture 3)Game Development  with Unity3D (Game Development lecture 3)
Game Development with Unity3D (Game Development lecture 3)
abdulrafaychaudhry
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
abdulrafaychaudhry
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
abdulrafaychaudhry
 

Recently uploaded (20)

Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Game Development with Unity3D (Game Development lecture 3)
Game Development  with Unity3D (Game Development lecture 3)Game Development  with Unity3D (Game Development lecture 3)
Game Development with Unity3D (Game Development lecture 3)
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
 

Is there anybody out there?

  • 1. Is there anybody out there? Voxxed Days Vienna 2018 Manuel Bernhardt
  • 2.
  • 3. manuel.bernhardt.io • Helping companies to get started with reac4ve systems ... • ... or to get them to perform • Lightbend consul4ng and training partner (Akka, Akka Cluster, Akka Streams) @elmanu - h+ps://manuel.bernhardt.io
  • 4. Mo#va#onal quote Life is a single player game. You’re born alone. You’re going to die alone. All of your interpreta8ons are alone. All your memories are alone. You’re gone in three genera8ons and no one cares. Before you showed up nobody cared. It’s all single player. — Naval Ravikant @elmanu - h+ps://manuel.bernhardt.io
  • 5.
  • 6.
  • 7.
  • 8. Key issues for building clusters • discovery: who's there? • fault detec0on: who's in trouble? • load balancing: who can take up work? @elmanu - h+ps://manuel.bernhardt.io
  • 9. Key issues for building clusters • discovery: who's there? • fault detec0on: who's in trouble? • load balancing: who can take up work? Group membership @elmanu - h+ps://manuel.bernhardt.io
  • 10. Implemen'ng group membership 1. Failure Detec.on 2. Dissemina.on 3. Consensus @elmanu - h+ps://manuel.bernhardt.io
  • 11. Failure detec,on @elmanu - h+ps://manuel.bernhardt.io
  • 12. Failure detector key proper0es • Completeness: crash-failure of any group member is detected by all non- faulty members @elmanu - h+ps://manuel.bernhardt.io
  • 13. Failure detector key proper0es • Completeness: crash-failure of any group member is detected by all non- faulty members • Accuracy: no non-faulty group member is declared as failed by any other non- faulty group member (no false posi9ves) @elmanu - h+ps://manuel.bernhardt.io
  • 14. Failure detector key proper0es • Completeness: crash-failure of any group member is detected by all non-faulty members • Accuracy: no non-faulty group member is declared as failed by any other non-faulty group member (no false posi9ves) Also relevant in prac/ce: • speed • network message load @elmanu - h+ps://manuel.bernhardt.io
  • 15. Impossibility result It is impossible for a failure detector algorithm to determinis3cally achieve both completeness and accuracy over an asynchronous unreliable network1 1 Chandra, Toueg: Unreliable failure detectors for reliable distributed systems (1996) @elmanu - h+ps://manuel.bernhardt.io
  • 16. Trade-offs • Strong - Weak completeness: all / some non-faulty members detect a crash • Strong - Weak accuracy: there are no / some false-posi7ves @elmanu - h+ps://manuel.bernhardt.io
  • 17. Trade-offs • Strong - Weak completeness: all / some non-faulty members detect a crash • Strong - Weak accuracy: there are no / some false-posi7ves In prac(ce most applica(ons prefer strong completeness with a weaker form of accuracy @elmanu - h+ps://manuel.bernhardt.io
  • 18. Failure Detector strategies • heartbeat: no heartbeat = failure • ping: no response = failure @elmanu - h+ps://manuel.bernhardt.io
  • 19. Phi Adap)ve Accrual Failure Detector 3 • has a cool name • adap.ve: adjusts to network condi.ons • introduces the no.on of accrual failure detec.on: suspicion value φ rather than boolean (trusted or suspected) 3 N. Hayashibara et al: The ϕ Accrual Failure Detector (2004) @elmanu - h+ps://manuel.bernhardt.io
  • 20. Phi Adap)ve Accrual Failure Detector 3 Example: master and worker processes • φ(w) > 8 stop sending new work to the node • φ(w) > 10 start to rebalance current tasks of the worker to other nodes • φ(w) > 12 remove the worker from the list of nodes 3 N. Hayashibara et al: The ϕ Accrual Failure Detector (2004) @elmanu - h+ps://manuel.bernhardt.io
  • 21. New Adap)ve accrual Failure Detector 4 • much simpler to calculate suspicion level than Phi • performs slightly be7er and more adap)ve 5 5 h$ps://manuel.bernhardt.io/2017/07/26/a-new-adap=ve-accrual- failure-detector-for-akka/ 4 B. Satzger et al: A new adap3ve accrual failure detector for dependable distributed systems (2007) @elmanu - h+ps://manuel.bernhardt.io
  • 22. SWIM Failure Detector As you swim lazily through the milieu, The secrets of the world will infect you 6 • has both a dissemina.on and a failure detec.on component • scalable membership protocol • members are first suspected and not immediately flagged as failed 6 A. Das et al: SWIM: scalable weakly-consistent infec:on-style process group membership protocol (2002) @elmanu - h+ps://manuel.bernhardt.io
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29. Lifeguard Failure Detector • based on SWIM, developed by Hashicorp 7 • memberlist implementa;on 8 • extensions to the SWIM protocol • dras;cally reduces the amount of false- posi;ves 8 h$ps://github.com/hashicorp/memberlist 7 A. Dadgar et al: Lifeguard: SWIM-ing with Situa:onal Awareness (2017) @elmanu - h+ps://manuel.bernhardt.io
  • 31. Dissemina(on How to communicate changes in the cluster? • members joining • members leaving • members failing @elmanu - h+ps://manuel.bernhardt.io
  • 32. Dissemina(on strategies • hardware / IP / UDP mul1cast: not readily (or willingly) enabled in data centres • gossip protocols @elmanu - h+ps://manuel.bernhardt.io
  • 33. Gossip styles • gossip to one node at random 10 10 van Renesse et al: A gossip-style failure detec9on service (1998) @elmanu - h+ps://manuel.bernhardt.io
  • 34. Gossip styles • gossip to one node at random 10 • round-robin, binary round-robin, round- robin with sequence check 11 11 S. Ranganathan et al: Gossip-Style Failure Detec:on and Distributed Consensus for Scalable Heterogeneous Clusters (2000) 10 van Renesse et al: A gossip-style failure detec9on service (1998) @elmanu - h+ps://manuel.bernhardt.io
  • 35. Gossip styles • gossip to one node at random 10 • round-robin, binary round-robin, round-robin with sequence check 11 11 S. Ranganathan et al: Gossip-Style Failure Detec:on and Distributed Consensus for Scalable Heterogeneous Clusters (2000) 10 van Renesse et al: A gossip-style failure detec9on service (1998) @elmanu - h+ps://manuel.bernhardt.io
  • 36. Gossip styles • gossip to one node at random 10 • round-robin, binary round-robin, round- robin with sequence check 11 • piggy-back on another protocol (e.g. on a failure detector 6 ): also called infec&on-style gossip 6 A. Das et al: SWIM: scalable weakly-consistent infec:on-style process group membership protocol (2002) 11 S. Ranganathan et al: Gossip-Style Failure Detec:on and Distributed Consensus for Scalable Heterogeneous Clusters (2000) 10 van Renesse et al: A gossip-style failure detec9on service (1998) @elmanu - h+ps://manuel.bernhardt.io
  • 37. What do you even gossip about @elmanu - h+ps://manuel.bernhardt.io
  • 38. What do you even gossip about @elmanu - h+ps://manuel.bernhardt.io
  • 39. What do you even gossip about @elmanu - h+ps://manuel.bernhardt.io
  • 40. Example of gossip op.miza.ons • Akka Cluster: gossip with a higher probability to nodes that have not already seen a gossip • Akka Cluster: speeds up gossip (3x) when less than half of the members have seen the latest gossip • Lifeguard: an<-entropy mechanism based on nodes doing a full sync with a node at random (helps to speed up convergence a?er a network par<<on) @elmanu - h+ps://manuel.bernhardt.io
  • 42. Impossibility result - group membership [...] the primary-par//on group membership problem cannot be solved in asynchronous systems with crash failures, even if one allows the removal or killing of non-faulty processes that are erroneously suspected to have crashed 9 9 Chandra et al: On the Impossibility of Group Membership (1996) @elmanu - h+ps://manuel.bernhardt.io
  • 43. It would be unwise to make membership-related decisions while there are processes suspected of having crashed @elmanu - h+ps://manuel.bernhardt.io
  • 44. Reaching consensus: .me • Lamport Clocks 12 : how do you order events in a distributed system 12 L. Lamport: Time, Clocks, and the Ordering of Events in a Distributed System (1978) @elmanu - h+ps://manuel.bernhardt.io
  • 45. Reaching consensus: .me • Lamport Clocks: how do you order events in a distributed system • Vector Clocks 13 : how do you order events in a distributed system and flag concurrent event 13 F. Ma(ern: Virtual Time and Global States of Distributed Systems (1989) @elmanu - h+ps://manuel.bernhardt.io
  • 46. Reaching consensus: .me • Lamport Clocks: how do you order events in a distributed system • Vector Clocks: how do you order events in a distributed system and flag concurrent events • Version Vectors 14 15 : how do you order replicas in a distributed system and detect conflicts • Do3ed Version Vectors 16 : same as version vectors, but prevent false conflicts 16 N. Preguiça: Do1ed Version Vectors: Efficient Causality Tracking for Distributed Key-Value Stores (2012) 15 h%ps://haslab.wordpress.com/2011/07/08/version-vectors-are-not-vector-clocks/ 14 D.S. Parker: Detec/on of mutual inconsistency in distributed systems (1983) @elmanu - h+ps://manuel.bernhardt.io
  • 47. Reaching consensus: .me private[cluster] final case class Gossip( members: immutable.SortedSet[Member], overview: GossipOverview = GossipOverview(), version: VectorClock = VectorClock(), tombstones: Map[UniqueAddress, Gossip.Timestamp] = Map.empty) @elmanu - h+ps://manuel.bernhardt.io
  • 48. Reaching consensus: CRDTs • Conflict-Free Replicated Data types 17 • strong eventual consistency • two families: • CmRDTs (commuta+vity of opera+ons) • CvRDTs (convergence of state - merge funcBon) 17 F.B. Schneider: Implemen4ng Fault Tolerant Services Using the State Machine Approach: A Tutorial (1990) @elmanu - h+ps://manuel.bernhardt.io
  • 49. Reaching consensus: replicated state machines Any suffisciently complicated model class contains an ad-hoc, informally-specified, bug-ridden, slow implementa;on of half a state machine — Pete Forde @elmanu - h+ps://manuel.bernhardt.io
  • 50. Reaching consensus: replicated state machines This method allows one to implement any desired form of mul4process synchroniza4on in a distributed system — Leslie Lamport 12 12 L. Lamport: Time, Clocks, and the Ordering of Events in a Distributed System (1978) @elmanu - h+ps://manuel.bernhardt.io
  • 51. Reaching consensus: protocols • fault-tolerant distributed systems • how do mul1ple servers agree on a value? • Paxos 19 20 , Ra@ 21 , CASPaxos 22 22 D. Rystsov: CASPaxos: Replicated State Machines without logs (2018) 21 D. Ongaro and J. Ousterhout: In Search of an Understandable Consensus Algorithm (Extended Version) (2014) 20 L. Lamport: Paxos made simple (2001) 19 L. Lamport: The Part-Time Parliament (1998) @elmanu - h+ps://manuel.bernhardt.io
  • 52. Reaching consensus: conven/ons • reaching a decision by not transmi2ng any informa4on • example: Akka Cluster leader designa4on /** * INTERNAL API * Orders the members by their address except that members with status * Joining, Exiting and Down are ordered last (in that order). */ private[cluster] val leaderStatusOrdering: Ordering[Member] = ... @elmanu - h+ps://manuel.bernhardt.io
  • 53. In prac(ce: Akka Cluster @elmanu - h+ps://manuel.bernhardt.io
  • 54. Akka Cluster • failure detector: φ Adap)ve Accrual FD with ping-pong • dissemina0on: random biased gossip • consensus: • leader by conven)on • membership decisions driven by leader (joining, leaving) @elmanu - h+ps://manuel.bernhardt.io
  • 55. Akka Cluster: membership states @elmanu - h+ps://manuel.bernhardt.io
  • 56. Akka Cluster: happy path @elmanu - h+ps://manuel.bernhardt.io
  • 57. Akka Cluster: sad path (aka reality) @elmanu - h+ps://manuel.bernhardt.io
  • 58. Akka Cluster: how do you even @elmanu - h+ps://manuel.bernhardt.io
  • 59.
  • 61. Failure detectors • Chandra, Toueg: Unreliable failure detectors for reliable distributed systems (1996) • N. Hayashibara et al: The ϕ Accrual Failure Detector (2004) • B. Satzger et al: A new adapNve accrual failure detector for dependable distributed systems (2007) • hQps://manuel.bernhardt.io/2017/07/26/a-new-adapNve-accrual-failure-detector-for-akka/ • A. Das et al: SWIM: scalable weakly-consistent infecNon-style process group membership protocol (2002) • A. Dadgar et al: Lifeguard: SWIM-ing with SituaNonal Awareness (2017) • hQps://github.com/hashicorp/memberlist @elmanu - h+ps://manuel.bernhardt.io
  • 62. Impossibility results • M.J. Fischer, N.A. Lynch, and M.S. Paterson: Impossibility of distributed consensus with one faulty process (1985) • Chandra, Toueg: Unreliable failure detectors for reliable distributed systems (1996) • Chandra et al: On the Impossibility of Group Membership (1996) @elmanu - h+ps://manuel.bernhardt.io
  • 63. Dissemina(on • van Renesse et al: A gossip-style failure detec8on service (1998) • S. Ranganathan et al: Gossip-Style Failure Detec8on and Distributed Consensus for Scalable Heterogeneous Clusters (2000) @elmanu - h+ps://manuel.bernhardt.io
  • 64. Consensus - )me • L. Lamport: Time, Clocks, and the Ordering of Events in a Distributed System (1978) • F. MaJern: Virtual Time and Global States of Distributed Systems (1989) • D.S. Parker: DetecNon of mutual inconsistency in distributed systems (1983) • hJps://haslab.wordpress.com/2011/07/08/version-vectors-are-not-vector- clocks/ • N. Preguiça: DoJed Version Vectors: Efficient Causality Tracking for Distributed Key-Value Stores (2012) @elmanu - h+ps://manuel.bernhardt.io
  • 65. Consensus - protocols • F.B. Schneider: Implemen3ng Fault Tolerant Services Using the State Machine Approach: A Tutorial (1990) • L. Lamport: The Part-Time Parliament (1998) • L. Lamport: Paxos made simple (2001) • D. Ongaro and J. Ousterhout: In Search of an Understandable Consensus Algorithm (Extended Version) (2014) • D. Rystsov: CASPaxos: Replicated State Machines without logs (2018) @elmanu - h+ps://manuel.bernhardt.io
  • 66. Consensus - misc • M. Shapiro: Conflict-free Replicated Data Types (2011) @elmanu - h+ps://manuel.bernhardt.io
  • 67. Thank you! • Ques&ons? • Website: h1ps://manuel.bernhardt.io • Twi1er: @elmanu @elmanu - h+ps://manuel.bernhardt.io