Roberto Baldoni Università di Roma “La Sapienza” Retirement Seminar for Professor Santosh Shrivastava 8 th  of September 2...
Santosh reminds me … a set of acronims  <ul><li>MIDAS (2001) </li></ul><ul><li>EUCOSM (2003) </li></ul><ul><li>LUCID (2004...
Outline <ul><li>Dynamic Distributed Systems </li></ul><ul><li>System Model with Churn </li></ul><ul><li>Regular Registers ...
Advent of Complex Distributed Applications <ul><ul><li>Peer-to-peer </li></ul></ul><ul><ul><li>Sensor Networks </li></ul><...
Managed vs. Unmanaged distributed applications (i) <ul><li>Managed  Distributed Application </li></ul><ul><li>Existence of...
Managed Distributed Applications: Consequences <ul><li>Main characteristics: a predefined setting, i.e.,  </li></ul><ul><u...
Managed vs. Unmanaged distributed applications (ii) <ul><li>Unmanaged  Distributed Applications  </li></ul><ul><ul><li>No ...
Unmanaged distributed applications: Consequences <ul><li>Autonomic/autonomous behavior of entities </li></ul><ul><li>Self-...
Spectrum of Possible System Models World Orderly  Chaotic Static Managed Distributed Systems Dynamic Unmanaged Distributed...
Uncertainty in Dynamic Distributed Systems <ul><li>Static Distributed Systems: </li></ul><ul><ul><li>Lack of temporal know...
System Model with Churn Roberto Baldoni,  “The price of mastering churn in a distributed system”
<ul><li>The distributed system is dynamic </li></ul><ul><ul><li>In each run, infinitely many processes can arrive and depa...
Abstractions <ul><li>Shared Memory </li></ul><ul><ul><li>Registers </li></ul></ul><ul><ul><li>Sets </li></ul></ul><ul><li>...
Churn Distributed System Distributed Computation Connectivity   Protocol Communication   Protocols Abstraction Roberto Bal...
Object Abstraction: The Regular Register A  register  is a shared variable accessed by processes through  read  and  write...
Regular Register Architecture at node i Roberto Baldoni,  “The price of mastering churn in a distributed system” Connectiv...
Regular Register: write() Roberto Baldoni,  “The price of mastering churn in a distributed system” The writer process p w ...
Processes in the distributed computation vs Active Processes Roberto Baldoni,  “The price of mastering churn in a distribu...
Processes in the distributed computation vs Active Processes Roberto Baldoni,  “The price of mastering churn in a distribu...
Processes in the distributed computation vs Active Processes N Churn A(t) t #processes Joining processe=leaving processes ...
An Algorithm in Synchronous System <ul><li>Assumption </li></ul><ul><ul><li>there is a bound  δ  such that any message sen...
Synchronous System  Safety:  case register i   ≠   Roberto Baldoni,  “The price of mastering churn in a distributed syste...
Synchronous System  Safety:  case register i  =    Roberto Baldoni,  “The price of mastering churn in a distributed syste...
Synchronous System  Safety:  case register i  =    Roberto Baldoni,  “The price of mastering churn in a distributed syste...
Synchronous System <ul><li>Termination . If a process invokes the join() operation and does not leave the system for at le...
Horizontal Quorums for Register Persistence Roberto Baldoni,  “The price of mastering churn in a distributed system” 3 δ j...
Horizontal Quorums for Register Persistence Roberto Baldoni,  “The price of mastering churn in a distributed system” 3 δ 3...
Eventually Synchronous System <ul><li>Assumption </li></ul><ul><ul><li>There exists a time t after that  there is a bound ...
Roberto Baldoni,  “The price of mastering churn in a distributed system” Vertical Quorums for Register Validity in Asynchr...
Asynchronous System <ul><li>There are no bound on message transfer delays </li></ul><ul><li>Theorem </li></ul><ul><ul><li>...
Regular Register with Byzantine Failures Roberto Baldoni,  “The price of mastering churn in a distributed system”
Regular Register with Byzantine Failures <ul><li>Composed by an arbitrary large set of client c 1 ... c m </li></ul><ul><l...
Computation Model <ul><li>Client are correct </li></ul><ul><li>No information about register state </li></ul><ul><li>Clien...
Computation Model <ul><li>Initially n servers are part of the register computation </li></ul><ul><li>Up to f byzantine fai...
Requirements Write Persistency:  Servers maintain the last value written by a write operation despite servers departures B...
Issues in read() operations time t 1 t 2 t i t k v x x v  v x x v v v x x v v x x v  v v      y 
Validity Bound <ul><li>Consider a generic protocol  P= {A JS ,  A R , A W  }  implementing a regular register such that </...
Validity Bound in a synchronous system <ul><li>TimelyBroadcastDelivery(TBDel) : There exists a known and finite bound  suc...
Pictorial Related Work and summary of results for Regular Register System Model Churn Model Failure model Asyncronous Even...
Pictorial Related Work and summary of results for Regular Register Roberto Baldoni,  “The price of mastering churn in a di...
Other Abstractions we faced <ul><li>Set object (Europar 2010, EWDC2011) </li></ul><ul><ul><li>More complex semantic than t...
Other Abstractions we faced <ul><li>Leader Election (EDCC2010) </li></ul><ul><ul><li>There is a bounded set of (good) proc...
   done in 2 Steps <ul><li>The HB* Oracle </li></ul><ul><ul><li>Provide a list of processes deemed to be up (alive list)....
Conclusion <ul><li>Dynamic Distributed  Systems are everywhere  </li></ul><ul><ul><li>Most of the todays systems are unman...
One slide to remember Roberto Baldoni,  “The price of mastering churn in a distributed system”
One slide to remember N Churn A(t) t #processes Joining processe=leaving processes Correctness bound Liveness and Safety i...
Upcoming SlideShare
Loading in …5
×

Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

650 views

Published on

A new challenge is emerging due to the advent
of new classes of applications and technologies such as smart environments, sensor networks, mobile systems, peertopeer systems, cloud computing etc. In these settings, the underlying distributed systems cannot be fully managed but it needs some degree of self-management that depends on the specific application domain. However, it is possible
to delineate some common consequences of the presence of such self management: first, there is no entity that can always ensure the validity of the system assumptions during the entire computation and, second, no one knows
accurately who joins and who leaves the system at any
time introducing a kind of unpredictability in the system
composition (this phenomenon of arrival and departure
of processes in a system is also known as churn).
As a consequence, distributed computing abstractions have to deal not only with asynchrony and failures, but also with this dynamic dimension where a process that does not crash can leave the system at any time implying that membership can fully change several times during the same
computation. Hence, the abstractions for reliable distributed compiuting
have to be reconsidered
to take into account this new “adversary” setting. This selfdefined
and continuously evolving distributed system, that
we will name in the following dynamic distributed system,
makes abstractions more difficult to understand and master
than in distributed systems where the set of processes is
fixed and known by all participants. The churn notion
becomes thus a system parameter whose aim is to make
tractable systems having their composition evolving along
the time.

The presentation analyzes the issues in building a regula register in an environment that considers crashs and byzantine failures.

This presentation has been delivered during the Retirement Seminar for Professor Santosh Shrivastava that took place in Newcastle (UK) on september 2011.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
650
On SlideShare
0
From Embeds
0
Number of Embeds
26
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • What is the weakest system model in which we are still able to provide meaningful specifications of a distributed computing abstraction and solutions?
  • Compattare con la precedente
  • Dire che cos’è la validity
  • Reliable Distributed Computing: The Price of Mastering Churn in Distributed Systems

    1. 1. Roberto Baldoni Università di Roma “La Sapienza” Retirement Seminar for Professor Santosh Shrivastava 8 th of September 2011, Newcastle, U K The Price of Mastering Churn in Distributed Systems Roberto Baldoni, “The price of mastering churn in a distributed system”
    2. 2. Santosh reminds me … a set of acronims <ul><li>MIDAS (2001) </li></ul><ul><li>EUCOSM (2003) </li></ul><ul><li>LUCID (2004) </li></ul><ul><li>MAGNET (2005) </li></ul><ul><li>VIRTUE (2007) </li></ul><ul><li>SEGOVIA (2009) </li></ul>Roberto Baldoni, “The price of mastering churn in a distributed system” Large and promising IP rejected --too many Chinese! FET IP - Very strong consortium - rejected reason «very nice projects, however it wants to provide a real software platfom for pooling together on-demand resources in a multi-tenant environment resistant to byzantine attack…. in FET program we do not fund engineering work» Just below the bar!
    3. 3. Outline <ul><li>Dynamic Distributed Systems </li></ul><ul><li>System Model with Churn </li></ul><ul><li>Regular Registers </li></ul><ul><li>Other interesting Abstractions </li></ul><ul><li>Conclusion </li></ul>Roberto Baldoni, “The price of mastering churn in a distributed system”
    4. 4. Advent of Complex Distributed Applications <ul><ul><li>Peer-to-peer </li></ul></ul><ul><ul><li>Sensor Networks </li></ul></ul><ul><ul><li>Mobile networks </li></ul></ul><ul><ul><li>Cloud computing federations </li></ul></ul><ul><ul><li>Internet supercomputing </li></ul></ul><ul><ul><li>Smart environments </li></ul></ul>Roberto Baldoni, “The price of mastering churn in a distributed system”
    5. 5. Managed vs. Unmanaged distributed applications (i) <ul><li>Managed Distributed Application </li></ul><ul><li>Existence of a manager that can control the entities comprising or running the application </li></ul><ul><li>The manager guarantees a suitable environment for a duration of time sufficient for a distributed system to behave correctly wrt its system model assumptions, e.g., </li></ul><ul><ul><li>Providing needed/sufficient/appropriate entities to enable correct behavior of the application (global application view) </li></ul></ul><ul><ul><li>Providing operational guarantees of QoS and the necessary degree of synchrony in the underlying distributed platform </li></ul></ul>Roberto Baldoni, “The price of mastering churn in a distributed system”
    6. 6. Managed Distributed Applications: Consequences <ul><li>Main characteristics: a predefined setting, i.e., </li></ul><ul><ul><li>The application knows, directly or indirectly, the set of processes that will participate in the computation </li></ul></ul><ul><ul><li>The application knows if it can exploit synchrony assumptions </li></ul></ul><ul><li>The system can be carefully and &quot;centrally&quot; configured through an appropriate tuning phase in order to get the best performance </li></ul><ul><li>The application cycle is: Design, deployment optimization, configuration , final deployment, operation </li></ul><ul><li>Managed Distributed Applications run on the top of a Distributed System that is piecewise static wrt time </li></ul>N entities N-1 entities N-2 N+3 time Roberto Baldoni, “The price of mastering churn in a distributed system”
    7. 7. Managed vs. Unmanaged distributed applications (ii) <ul><li>Unmanaged Distributed Applications </li></ul><ul><ul><li>No assumption of a manager or access to equivalent management facilities </li></ul></ul><ul><ul><li>Each process autonomously decides to locally run a component of a distributed application when (a) joining and (b) leaving the system </li></ul></ul><ul><ul><ul><li>the system and/or its components do not start with a known and pre-defined setting </li></ul></ul></ul><ul><ul><li>“ Nice” manageable system model assumptions either cannot be guaranteed or do not last for long </li></ul></ul>Roberto Baldoni, “The price of mastering churn in a distributed system”
    8. 8. Unmanaged distributed applications: Consequences <ul><li>Autonomic/autonomous behavior of entities </li></ul><ul><li>Self-defined, self-instantiating (& self*?) and perpetually evolving distributed system </li></ul><ul><ul><li>It is impossible to know the set of processes participating to the computation because it changes dynamically and can potentially grow without bounds </li></ul></ul><ul><ul><li>E.g., the system could cease existing when no process is active, and at other times the system may be made of thousands of active processes </li></ul></ul><ul><li>. . . Dynamic Distributed System </li></ul>Roberto Baldoni, “The price of mastering churn in a distributed system”
    9. 9. Spectrum of Possible System Models World Orderly Chaotic Static Managed Distributed Systems Dynamic Unmanaged Distributed Systems Roberto Baldoni, “The price of mastering churn in a distributed system” Air traffic Control Mobile ad-hoc Systems Cloud Computing Peer-to-peer
    10. 10. Uncertainty in Dynamic Distributed Systems <ul><li>Static Distributed Systems: </li></ul><ul><ul><li>Lack of temporal knowledge </li></ul></ul><ul><ul><li>Failures </li></ul></ul><ul><ul><li>Unknown communication delays </li></ul></ul><ul><li>Dynamic Distributed Systems </li></ul><ul><ul><li>Same issues as in static distributed systems, plus </li></ul></ul><ul><ul><li>Non-monotonic and unknown size of the system </li></ul></ul><ul><ul><li>Potentially changing properties of the “universe” </li></ul></ul><ul><ul><li>Unclear notions of efficiency, effectiveness, scalability </li></ul></ul>Roberto Baldoni, “The price of mastering churn in a distributed system” <ul><ul><li>Solid theoretical foundations </li></ul></ul><ul><ul><li>Precise problem specifications </li></ul></ul><ul><ul><li>Rigorously correct solutions </li></ul></ul>
    11. 11. System Model with Churn Roberto Baldoni, “The price of mastering churn in a distributed system”
    12. 12. <ul><li>The distributed system is dynamic </li></ul><ul><ul><li>In each run, infinitely many processes can arrive and depart from the system but at any point in time the number of processes is finite ( Infinite Arrival Model ) </li></ul></ul><ul><li>Processes participate in a distributed computation running on top of the distributed system </li></ul><ul><ul><li>Processes of the distributed system decide at their will to join and leave the distributed computation (i.e. the computation is affected by continuous churn ) </li></ul></ul><ul><ul><li>No process is guaranteed to participate for ever in the distributed computation </li></ul></ul><ul><li>Each process has a unique identifier </li></ul><ul><li>Processes can crash and this can be seen as a leave of the process </li></ul>Roberto Baldoni, “The price of mastering churn in a distributed system” System Model with Churn
    13. 13. Abstractions <ul><li>Shared Memory </li></ul><ul><ul><li>Registers </li></ul></ul><ul><ul><li>Sets </li></ul></ul><ul><li>One-shot problem </li></ul><ul><ul><li>Interval valid queries </li></ul></ul><ul><li>Agreement Problem </li></ul><ul><ul><li>Leader Election </li></ul></ul>Roberto Baldoni, “The price of mastering churn in a distributed system”
    14. 14. Churn Distributed System Distributed Computation Connectivity Protocol Communication Protocols Abstraction Roberto Baldoni, “The price of mastering churn in a distributed system” For simplicity we assume N processes are in the distributed computation at any given time
    15. 15. Object Abstraction: The Regular Register A register is a shared variable accessed by processes through read and write operations Roberto Baldoni, “The price of mastering churn in a distributed system”
    16. 16. Regular Register Architecture at node i Roberto Baldoni, “The price of mastering churn in a distributed system” Connectivity Layer Point-to-Point Link Broadcast Regular Register If pi invokes the send(m) operation to pj at time t then pj will receive m by time t+  if it has not left the system by that time If pi invokes the broadcast(m) operation at time t and does not leave the system by time t+  then all the processes that are in the system at time t and does not leave the system by time t+  will deliver m by time t+  <ul><li>(liveness) If a process invokes a read or a write operation and does not leave the system, it eventually returns from that operation </li></ul><ul><li>(safety) A read operation returns the last value written or a value written by a concurrent write </li></ul>Read() write(v) join() REG System Computation
    17. 17. Regular Register: write() Roberto Baldoni, “The price of mastering churn in a distributed system” The writer process p w wants to write the value v p w sends a broadcast message (WRITE, v, sn) … in the meanwhile processes join and leave the computation OBS . Only processes belonging to the computation when p w starts the write and that remain in the computation for all the time of the write will maintain the updated copy of the register Active Processes keeps the state of the computation Distributed System A subset of processes participate to the register computation p w
    18. 18. Processes in the distributed computation vs Active Processes Roberto Baldoni, “The price of mastering churn in a distributed system” N Churn A(t) t Correctness bound #processes Joining processe=leaving processes
    19. 19. Processes in the distributed computation vs Active Processes Roberto Baldoni, “The price of mastering churn in a distributed system” N Churn A(t) t Correctness bound #processes Joining processe=leaving processes Movement of the bound is impacted by the system model. The weaker the system model is the more «static» the system becomes. This brings several impossibility results in presence of churn.
    20. 20. Processes in the distributed computation vs Active Processes N Churn A(t) t #processes Joining processe=leaving processes Correctness bound Liveness and Safety issues Roberto Baldoni, “The price of mastering churn in a distributed system” Movement of the bound is impacted by the system model. The weaker the system model is the more «static» the system becomes. This brings several impossibility results in presence of churn.
    21. 21. An Algorithm in Synchronous System <ul><li>Assumption </li></ul><ul><ul><li>there is a bound δ such that any message sent (broadcast) at time τ ≥ t , is received (delivered) by time τ + δ to the processes that are in the system during the interval [ τ, τ + δ ]. </li></ul></ul><ul><ul><li>A process remain in the system at least 3 δ </li></ul></ul><ul><li>Algorithm </li></ul><ul><ul><li>Read local </li></ul></ul><ul><ul><li>Write global </li></ul></ul><ul><ul><li>Join global </li></ul></ul>Roberto Baldoni, “The price of mastering churn in a distributed system”
    22. 22. Synchronous System Safety: case register i ≠  Roberto Baldoni, “The price of mastering churn in a distributed system” Join()  0 0 0  1 p i p j p h p k <ul><li>p i has received a </li></ul><ul><li>WRITE(< val,sn >) message during the first waiting phase and accordingly updated register i </li></ul><ul><li>the write operation lasts  time </li></ul><ul><li>the join operation lasts at least  time </li></ul><ul><li>a write message takes at most  time to be delivered </li></ul><ul><li>Then the join and the write are concurrent and the join terminates with the last value written </li></ul>write (1) 1 1 1 WRITE(1, 1) Join Write Reply
    23. 23. Synchronous System Safety: case register i =  Roberto Baldoni, “The price of mastering churn in a distributed system” Join()   0 0 0  0 p i p j p h p k  INQUIRY(i) REPLY(h, 0, 0) If no write is concurrent with the join operation, and c<1/3  then there always exists an active process that replies with the last written value Join Write Reply
    24. 24. Synchronous System Safety: case register i =  Roberto Baldoni, “The price of mastering churn in a distributed system” write (1) Join()    0 0 0  1 1 1 p i p j p h p k  INQUIRY(i) REPLY(h, 0, 0) WRITE(1, 1) p i can receive both WRITE( < val,sn > ) messages and REPLY( < j, val, sn > ) messages. According the values received at time τ + 2 δ , p i will update register i to the value written by a concurrent update, or the value written before the concurrent writes WRITE(1, 1) If pi receives the write before the reply, pi does not overwrite the value and then any following write will return the last value written.
    25. 25. Synchronous System <ul><li>Termination . If a process invokes the join() operation and does not leave the system for at least 3  time units, or invokes the read() operation, or invokes the write() operation and does not leave the system for at least  time units, it does terminates the invoked operation. </li></ul><ul><li>Safety. Let [  ,  +  ] any interval of the computation. if (c x n) in [  ,  +  ] is lesser than n/(3  ) (i.e., c < 1/3  ). A read() operation returns the last value written before the read invocation, or a value written by a write operation concurrent with it. </li></ul>Roberto Baldoni, “The price of mastering churn in a distributed system”
    26. 26. Horizontal Quorums for Register Persistence Roberto Baldoni, “The price of mastering churn in a distributed system” 3 δ joining Active process Non-active process 1 5 9 3 1 5 9 8 1 5 7 8 2 5 7 8 2
    27. 27. Horizontal Quorums for Register Persistence Roberto Baldoni, “The price of mastering churn in a distributed system” 3 δ 3 δ joining joining Active process Non-active process <ul><li>The register persistence is preserved iff the churn is below a given bound depending of protocol implementation </li></ul>1 5 9 3 1 5 9 8 1 5 7 8 2 5 7 8 2 6 7 8 2 6 7 3 2 3
    28. 28. Eventually Synchronous System <ul><li>Assumption </li></ul><ul><ul><li>There exists a time t after that there is a bound δ such that any message sent (broadcast) at time τ ≥ t , is received (delivered) by time τ + δ to the processes that are in the system during the interval [ τ, τ + δ ]. </li></ul></ul><ul><ul><li>There exists a time t after that c < 1 / 3 δ . </li></ul></ul><ul><ul><li>A process remain in the system at least 3 δ </li></ul></ul><ul><li>Algorithm </li></ul><ul><ul><li>Read global </li></ul></ul><ul><ul><li>Write global </li></ul></ul><ul><ul><li>Join global </li></ul></ul>Roberto Baldoni, “The price of mastering churn in a distributed system”
    29. 29. Roberto Baldoni, “The price of mastering churn in a distributed system” Vertical Quorums for Register Validity in Asynchronous Periods <ul><li>Validity of the read: </li></ul><ul><ul><li>During asynchrony periods to be sure to read the last written value you need to read/write registers from a majority of processes in the system (you do not have anymore the guarantee that messages are delivered within a known bound) </li></ul></ul>time Termination. Let us assume that |A(t)| > n/ 2 (i.e., majority of processes is active at any time) , if a process invokes join(), read() or write (), and does not leave the system, it terminates its operation. Safety. Let us assume that |A(t)| > n/2, a read operation returns the last value written before the read invocation, or a value written by a write operation concurrent with i
    30. 30. Asynchronous System <ul><li>There are no bound on message transfer delays </li></ul><ul><li>Theorem </li></ul><ul><ul><li>It is not possible to implement a regular register in a fully asynchronous dynamic system. </li></ul></ul><ul><li>The results is similar to the one of [Attiya – Bar-Noy -Dolev JACM95] when considering a static system with any number of process failures </li></ul>Roberto Baldoni, “The price of mastering churn in a distributed system”
    31. 31. Regular Register with Byzantine Failures Roberto Baldoni, “The price of mastering churn in a distributed system”
    32. 32. Regular Register with Byzantine Failures <ul><li>Composed by an arbitrary large set of client c 1 ... c m </li></ul><ul><li>Dynamic: servers may join and leave (infinite arrival model) </li></ul><ul><ul><li>Join_System() operation : connects new processes to the system </li></ul></ul><ul><ul><li>Leave_System() operation : passive leave </li></ul></ul>Connection Layer (e.g. Overlay Management Protocol) (Authenticated)Communication Layer (Best-effort Semantics) Distributed Computation (i.e. Regular Register)
    33. 33. Computation Model <ul><li>Client are correct </li></ul><ul><li>No information about register state </li></ul><ul><li>Clients triggers read() and write() operations </li></ul>Write (v) Read ()
    34. 34. Computation Model <ul><li>Initially n servers are part of the register computation </li></ul><ul><li>Up to f byzantine failures (f < n/3) </li></ul><ul><li>Servers maintain locally a copy of the register value </li></ul><ul><li>Alternating periods of churn and stability </li></ul><ul><ul><li>No stable processes </li></ul></ul><ul><ul><li>In churn periods the servers set is refreshed of cn servers in each time unit (c  [0, 1]). </li></ul></ul>Write (v) Read () v v v v x v x v Join_Server() 
    35. 35. Requirements Write Persistency: Servers maintain the last value written by a write operation despite servers departures Byzantine Resiliency: There are always at least f+1 servers maintaining the same value Read- Validity: any read() operation returns the last value written by a completed write() or a value concurrently written
    36. 36. Issues in read() operations time t 1 t 2 t i t k v x x v  v x x v v v x x v v x x v  v v      y 
    37. 37. Validity Bound <ul><li>Consider a generic protocol P= {A JS , A R , A W } implementing a regular register such that </li></ul><ul><ul><li>every operation eventually terminates and </li></ul></ul><ul><ul><li>there exists a period of churn longer than the longest operation issued on the register </li></ul></ul>Theorem : Let A JS , A R and A W be the algorithms implementing respectively join_Server(), read() and write() operations. Let  t j ,  t r and  t w be the maximum time intervals needed by the previous algorithm to terminate the operation. If c  min {(n-3f)/(n  t r ), (n-3f)/(n (  t j +  t w )} then it is not possible to ensure both write persistency and read validity
    38. 38. Validity Bound in a synchronous system <ul><li>TimelyBroadcastDelivery(TBDel) : There exists a known and finite bound such that every message broadcast at some time t is delivered up to time t +  . </li></ul><ul><li>TimelyChannelDelivery(TCDel) : There exists a known and finite bound  ’ <  such that every message sent at some time t is delivered up to time t +  ’ . </li></ul>Roberto Baldoni, “The price of mastering churn in a distributed system”
    39. 39. Pictorial Related Work and summary of results for Regular Register System Model Churn Model Failure model Asyncronous Eventually synchronous synchronous crash byzantine static quiescent continuous Aguilera et al. PODC 2010 Baldoni et al. ICDCS 2009 Baldoni et al. PODC 2011 Roberto Baldoni, “The price of mastering churn in a distributed system”
    40. 40. Pictorial Related Work and summary of results for Regular Register Roberto Baldoni, “The price of mastering churn in a distributed system” No Churn Quiescent Churn Continuous Churn Synch Crash BFT papers Baldoni et al ICDCS 2009 Byzant Baldoni et al. PODC 2011 (ba) Event Synch crash Baldoni et al ICDCS 2009 byzantine Open Problem Asynch Crash Aguillera et al 2009 Impossible byzant Open Problem
    41. 41. Other Abstractions we faced <ul><li>Set object (Europar 2010, EWDC2011) </li></ul><ul><ul><li>More complex semantic than the one of registers </li></ul></ul><ul><ul><li>The set containts all its history </li></ul></ul><ul><li>Main result: It is not possible to implement a set object in an eventually synchronous distributed system prone to continuous churn if: </li></ul><ul><ul><li>Processes have only finite memory space for local computation </li></ul></ul><ul><ul><li>Accesses to the set are continuous </li></ul></ul><ul><ul><li>There are no stable processes participating in the set computation </li></ul></ul><ul><ul><li>k-bounded set in an eventually synchronous distributed system </li></ul></ul>Roberto Baldoni, “The price of mastering churn in a distributed system”
    42. 42. Other Abstractions we faced <ul><li>Leader Election (EDCC2010) </li></ul><ul><ul><li>There is a bounded set of (good) processes that gets into the computation and remain forever (no one knows who they are) </li></ul></ul><ul><ul><li>Churn is continuous </li></ul></ul><ul><ul><li>Communication is synchronous with finite losses and unknown maximum transfer delay </li></ul></ul><ul><li>Risk: elect an infinite sequence of processes that leave the system (bad processes) </li></ul><ul><li>Main result: « under these assumptions we can implement leader election » </li></ul>Roberto Baldoni, “The price of mastering churn in a distributed system”
    43. 43.  done in 2 Steps <ul><li>The HB* Oracle </li></ul><ul><ul><li>Provide a list of processes deemed to be up (alive list). The list aims to: </li></ul></ul><ul><ul><ul><li>Put good processes on the top of the list </li></ul></ul></ul><ul><ul><ul><li>Stabilize the position of a good process in the list </li></ul></ul></ul><ul><li> protocol </li></ul><ul><ul><li>Take the list provided by the HB* protocol and output the leader </li></ul></ul>Roberto Baldoni, “The price of mastering churn in a distributed system” leader alive list send/receive multicast/receive  HB* unicast multicast
    44. 44. Conclusion <ul><li>Dynamic Distributed Systems are everywhere </li></ul><ul><ul><li>Most of the todays systems are unmanaged to some extent </li></ul></ul><ul><ul><li>Some of the functionality have to be autonomic and do not rely on a manager </li></ul></ul><ul><li>Dynamic Distributed Systems are unquestionably more complex than static ones this leads to more complex solutions to solve the same problem </li></ul><ul><li>Scalability and dynamicity are not synonymous </li></ul><ul><li>Understanding the how to implement abstractions in a efficient way and well-suited to a dynamic distributed systems is stil an open and fashinating problem </li></ul>Roberto Baldoni, “The price of mastering churn in a distributed system”
    45. 45. One slide to remember Roberto Baldoni, “The price of mastering churn in a distributed system”
    46. 46. One slide to remember N Churn A(t) t #processes Joining processe=leaving processes Correctness bound Liveness and Safety issues Roberto Baldoni, “The price of mastering churn in a distributed system” Movement of the bound is impacted by the system model. The weaker the system model is the more «static» the system becomes. This brings several impossibility results in presence of churn.

    ×