Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Distributed Computing with Data Distribution Service DDS

636 views

Published on

Designing distributed systems is hard and one of the main aims of the Data Distribution Standard (DDS) is to make this task less daunting. Yet, to exploit DDS's full potential it is key to understand the coordination model and architectural style it promotes along with the key properties that it guarantees. Only after having understood these concepts will you realize the full power of DDS. This presentation, after summarizing the main challenges that architects face when designing distributed systems will (1) introduce a series of canonical coordination models, (2) explain DDS's coordination model and its powerful properties, (3) identify the key patterns that underlie the coordination model (4) show how this coordination model can be used to build some interesting distributed applications and some key distributed algorithms.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Distributed Computing with Data Distribution Service DDS

  1. 1. CopyrightPrismTech,2016 Angelo Corsaro, PhD CTO, ADLINK Tech. Inc. Co-Chair, OMG DDS-SIG angelo.corsaro@adlinktech.com Distributed Computing with DDS
  2. 2. CopyrightPrismTech,2015 Distributed Systems
  3. 3. CopyrightPrismTech,2015 A distributed system is a model in which components located on networked computers communicate and coordinate their actions by passing messages. The components interact with each other in order to achieve a common goal. Three significant characteristics of distributed systems are: concurrency of components, lack of a global clock, and independent failure of components. — Wikipedia Distributed System Definition
  4. 4. CopyrightPrismTech,2015 A distributed system is a model in which components located on networked computers communicate and coordinate their actions by passing messages. The components interact with each other in order to achieve a common goal. Three significant characteristics of distributed systems are: concurrency of components, lack of a global clock, and independent failure of components. — Wikipedia Distributed System Definition Well…Well..Well… This may be true at the transport level, but the components may coordinate using different models as we’ll see later.
  5. 5. CopyrightPrismTech,2015 A distributed system is a model in which components located on networked computers communicate and coordinate their actions by passing messages. The components interact with each other in order to achieve a common goal. Three significant characteristics of distributed systems are: concurrency of components, lack of a global clock, and independent failure of components. — Adapted from Wikipedia Distributed System Definition
  6. 6. CopyrightPrismTech,2015 A distributed system is a model in which components located on networked computers communicate and coordinate their actions to achieve a common goal. Three significant characteristics of distributed systems are: concurrency of components, lack of a global clock, and independent failure of components. — Adapted from Wikipedia Distributed System Definition
  7. 7. CopyrightPrismTech,2015 A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable. — Leslie Lamport, 28 May 1987 Distributed System Definition
  8. 8. CopyrightPrismTech,2015 A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable. — Leslie Lamport, 28 May 1987 Distributed System Definition
  9. 9. CopyrightPrismTech,2015 Distributed Computing/ Coordination Models
  10. 10. CopyrightPrismTech,2015 Computation models are symmetric if the processes involved in the distributed computation don’t assume special roles, in other terms they are all peers Computational models are asymmetric if some processes assume a special role, i.e. server or client Symmetric vs asymmetric
  11. 11. CopyrightPrismTech,2015 Some distributed computational / coordination model support anonymous communication in the sense that communication parties are unaware of each other Other, require explicit knowledge of parties with which communication has to happen Anonymous vs Named
  12. 12. CopyrightPrismTech,2015 Message passing is a symmetric computation model in which distributed processes communicate and cooperate asynchronously sending messages to each other Examples: Sockets, Agents, MPI Message Passing msg msg msg Process msg Process Process Process
  13. 13. CopyrightPrismTech,2015 Client/Server is an asymmetric computation model in which distributed processes communicate and cooperate by requesting services (often synchronously) to special processes called servers Examples: Java RMI, CORBA, OPC-UA Client/Server request reply Client Server request reply Server Client request reply request reply Client
  14. 14. CopyrightPrismTech,2015 Message Queues is a symmetric and anonymous computation model in which distributed processes communicate and coordinate by asynchronously putting and getting messages on named queues Examples: AMQP, JMS Queues, AWS SQS Message Queues Process Process Process Process put get Process get get put get get put Queue Queue Queue
  15. 15. CopyrightPrismTech,2015 The Tuple Space is a symmetric and anonymous computation model in which distributed processes communicate and coordinate by asynchronously reading and writing tuples, i.e. data, into a tuple-space Examples: DBMS, DDS, Linda Tuple Spaces Process ProcessProcess Process Process Process write read|take write read | take write read|take write read|take write read|take write read | take
  16. 16. CopyrightPrismTech,2015 DDS
  17. 17. CopyrightPrismTech,2015 DDS provides a Tuple Space inspired symmetric computation model in which distributed processes communicate and coordinate by asynchronously reading and writing data into an eventually consistent data space The DDS ModeL Process ProcessProcess Process Process Process write read|take write read | take write read|take write read|take write read|take write read | take
  18. 18. CopyrightPrismTech,2015 Applications can autonomously and asynchronously read and write data enjoying spatial and temporal decoupling Virtualised Data Space DDS Global Data Space ... Data Writer Data Writer Data Writer Data Reader Data Reader Data Reader Data Reader Data Writer TopicA QoS TopicB QoS TopicC QoS TopicD QoS
  19. 19. CopyrightPrismTech,2015 Virtualised Data Space
  20. 20. CopyrightPrismTech,2015 Virtualised Data Space Data Writer
  21. 21. CopyrightPrismTech,2015 Virtualised Data Space Data Writer
  22. 22. CopyrightPrismTech,2015 Virtualised Data Space
  23. 23. CopyrightPrismTech,2015 Virtualised Data Space Data Reader
  24. 24. CopyrightPrismTech,2015 DDS’s Data Space is eventually consistent with respect to writes That means that readers of some kind of data will eventually see a write, but they may not observe it at the “same time” CONSISTENCY MODEL DDS Global Data Space ... Data Writer Data Writer Data Writer Data Reader Data Reader Data Reader Data Reader Data Writer TopicA QoS TopicB QoS TopicC QoS TopicD QoS
  25. 25. CopyrightPrismTech,2015 Given a property P(t) we say that this property is eventually true iff: Eventual Properties
  26. 26. CopyrightPrismTech,2015 Consistency with respect to a datum means that anything/ anybody looking at the datum will see exactly the same value. Eventually Consistent means that consistency will be “eventually” asserted, but before t* (which in unknown in asynchronous and partially synchronous systems), anything/ anybody looking at the datum may see different values. Understanding eventual consistency
  27. 27. CopyrightPrismTech,2015 A Topic defines a domain- wide information’s class by a <name, type, qos> triple DDS Topics allow to express functional and non- functional properties of a system information model Topic DDS Global Data Space ... Data Writer Data Writer Data Writer Data Reader Data Reader Data Reader Data Reader Data Writer TopicA QoS TopicB QoS TopicC QoS TopicD QoS Topic Type Name QoS
  28. 28. Topic types can be expressed using different syntaxes, including IDL and ProtoBuf Topic Type IDL struct TemperatureSensor { @key long sid; float temp; float hum; }
  29. 29. CopyrightPrismTech,2015 Each unique key value identifies a unique stream of data DDS demultiplexes “streams” and provides per-instance lifecycle information A Writer can write multiple instances Instances Topic InstancesInstances sid =”12345” sid =”54321” sid =”15243” struct TemperatureSensor { @key long sid; float temp; float hum; };
  30. 30. CopyrightPrismTech,2015 Reader & Writer Caches
  31. 31. CopyrightPrismTech,2015 Each Writer and Reader have an associated Data Cache Data Cache DDS Global Data Space ... Data Writer Data Writer Data Writer Data Reader Data Reader Data Reader Data Reader Data Writer TopicA QoS TopicB QoS TopicC QoS TopicD QoS
  32. 32. CopyrightPrismTech,2015 The writer’s cache stores (a subset of) the data written Writer Cache DDS Global Data Space ... Data Writer Data Writer Data Writer Data Reader Data Reader Data Reader Data Reader Data Writer TopicA QoS TopicB QoS TopicC QoS TopicD QoS
  33. 33. CopyrightPrismTech,2015 The reader’s cache contains a projection of the global data space that reflect the reader “interest” Reader Cache DDS Global Data Space ... Data Writer Data Writer Data Writer Data Reader Data Reader Data Reader Data Reader Data Writer TopicA QoS TopicB QoS TopicC QoS TopicD QoS
  34. 34. CopyrightPrismTech,2015 A Reader/Writer Cache can stores the last n∊𝜨∞ samples for each relevant instance. The cache properties are configured via QoS. Data Cache Data Cache ... Samples Instances Cache Where: 𝜨∞ =𝜨 ∪ {∞}
  35. 35. CopyrightPrismTech,2015 The action of reading samples for a Reader Cache is non-destructive. Samples are not removed from the cache Reading Samples DataReader Cache DataReader ... DataReader Cache DataReader ...read
  36. 36. CopyrightPrismTech,2015 The action of taking samples for a Reader Cache is destructive. Samples are removed from the cache taking samples DataReader Cache DataReader ...take DataReader Cache DataReader ...
  37. 37. CopyrightPrismTech,2015 Samples can be selected using composable content and status predicates Sample selectors DataReader Cache DataReader ...
  38. 38. CopyrightPrismTech,2015 Filters allow to control what gets into a DataReader cache Filters are expressed as SQL where clauses or as Java/C/ JavaScript predicates Data filters DataReader Cache DataReader ... Filter Application Network
  39. 39. CopyrightPrismTech,2015 Queries allow to control what gets out of a DataReader Cache Queries are expressed as SQL where clauses or as Java/C/JavaScript predicates Data Queries DataReader Cache DataReader ... Query DataReader Cache DataReader ... Application Network
  40. 40. CopyrightPrismTech,2015 State based selection allows to control what gets out of a DataReader Cache based on samples (read or not), instance (alive or not) and view (known or not) states State Selectors DataReader Cache DataReader ... State Selector DataReader Cache DataReader ... Application Network
  41. 41. CopyrightPrismTech,2015 QoS policies allow the expression and control over data’s temporal and availability constraints QoS Enabled DDS Global Data Space ... Data Writer Data Writer Data Writer Data Reader Data Reader Data Reader Data Reader Data Writer TopicA QoS TopicB QoS TopicC QoS TopicD QoS
  42. 42. CopyrightPrismTech,2015 QoS Policies controlling end-to-end properties follow a Request vs. Offered QoS Domain Participant DURABILITY OWENERSHIP DEADLINE LATENCY BUDGET LIVELINESS RELIABILITY DEST. ORDER Publisher DataWriter PARTITION DataReader Subscriber Domain Participant offered QoS Topic writes reads Domain Id joins joins produces-in consumes-from RxO QoS Policies requested QoS
  43. 43. CopyrightPrismTech,2015 Topics as Channels
  44. 44. CopyrightPrismTech,2015 We can think of a DataWriter and its matching DataReaders as connected by a logical typed communication channel The properties of this channel are controlled by means of QoS Policies At the two extreme this logical communication channel can be: - Best-Effort/Reliable Last n-values Channel - Best-Effort/Reliable FIFO Channel Channel Properties DR DR DR TopicDW
  45. 45. CopyrightPrismTech,2015 Last n-values Channel The last n-values channel is useful when modelling distributed state When n=1 then the last value channel provides a way of modelling an eventually consistent distributed state This abstraction is very useful if what matters is the current value of a given topic instance The Qos Policies that give a Last n-value Channel are: - RELIABILITY = RELIABLE - HISTORY = KEEP_LAST(n) - DURABILITY = TRANSIENT | PERSISTENT [in most cases] DR DR DR TopicDW
  46. 46. CopyrightPrismTech,2015 The FIFO Channel is useful when we care about every single sample that was produced for a given topic -- as opposed to the “last value” This abstraction is very useful when writing distributing algorithm over DDS Depending on Qos Policies, DDS provides: - Best-Effort/Reliable FIFO Channel - FT-Reliable FIFO Channel (using an OpenSplice-specific extension) The Qos Policies that give a FIFO Channel are: - RELIABILITY = RELIABLE - HISTORY = KEEP_ALL fifo channel DR DR DR TopicDW
  47. 47. CopyrightPrismTech,2015 We can think of a DDS Topic as defining a group The members of this group are matching DataReaders and DataWriters DDS’ dynamic discovery manages this group membership, however it provides a low level interface to group management and eventual consistency of views In addition, the group view provided by DDS makes available matched readers on the writer- side and matched-writers on the reader-side This is not sufficient for certain distributed algorithms. membership DR DR DR TopicDW DataWriter Group View DW DW DRTopic DW DataReader Group View
  48. 48. CopyrightPrismTech,2015 DDS provides built-in mechanism for detection of DataWriter faults through the LivelinessChangedStatus A writer is considered as having lost its liveliness if it has failed to assert it within its lease period fault detection DW DW DRTopic DW DataReader Group View
  49. 49. CopyrightPrismTech,2015 System Model
  50. 50. CopyrightPrismTech,2015 Partially Synchronous - After a Global Stabilisation Time (GST) communication latencies are bounded, yet the bound is unknown Non-Byzantine Fail/Recovery - Process can fail and restart but don’t perform malicious actions System Model
  51. 51. CopyrightPrismTech,2015 The algorithms that will be showed next are implemented on OpenSplice using the Moliere Scala API All algorithms are available as part of the Open Source project dada Programming environment ! DDS-based Advanced Distributed Algorithms Toolkit !Open Source !github.com/kydos/dada
  52. 52. CopyrightPrismTech,2015 Distributed Algorithms
  53. 53. CopyrightPrismTech,2015 Group Management
  54. 54. CopyrightPrismTech,2015 A Group Management abstraction should provide the ability to join/leave a group, provide the current view and detect failures of group members Ideally group management should also provide the ability to elect leaders A Group Member should represent a process Group management Abstraction abstract class Group { // Join/Leave API def join(mid: Int) def leave(mid: Int) // Group View API def size: Int def view: List[Int] def waitForViewSize(n: Int) def waitForViewSize(n: Int, timeout: Int) // Leader Election API def leader: Option[Int] def proposeLeader(mid: Int, lid: Int) }
  55. 55. CopyrightPrismTech,2015 The group management algorithm that follows will provide eventually consistent views as well as eventual leaders Whilst eventual consistency seems to weakens the abstraction, there are plenty of situations when this is actually more than enough. It is also worth noticing that these algorithm as very efficient thanks to the eventual consistency assumption Eventually consistent group views
  56. 56. CopyrightPrismTech,2015 To implement the Group abstraction with support for Leader Election it is sufficient to rely on the following topic types: Topic types enum TMemberStatus { JOINED, LEFT, FAILED, SUSPECTED }; struct TMemberInfo { long mid; // member-id TMemberStatus status; }; #pragma keylist TMemberInfo mid struct TEventualLeaderVote { long long epoch; long mid; long lid; // voted leader-id }; #pragma keylist TEventualLeaderVote mid
  57. 57. CopyrightPrismTech,2015 Group Management The TMemberInfo topic is used to advertise presence and manage the members state transitions Leader Election The TEventualLeaderVote topic is used to cast votes for leader election This leads us to: Topic(name = MemberInfo, type = TMemberInfo, QoS = {Reliability.Reliable, History.KeepLast(1), Durability.TransientLocal}) Topic(name = EventualLeaderVote, type = TEventualLeaderVote, QoS = {Reliability.Reliable, History.KeepLast(1), Durability.TransientLocal} Topics
  58. 58. CopyrightPrismTech,2015 Notice that we are using two Last-Value Channels for implementing both the (eventual) group management and the (eventual) leader election This makes it possible to: - Let DDS provide our latest known state automatically thanks to the TransientLocal Durability - No need for periodically asserting our liveliness. DDS will do that for our DataWriter observation
  59. 59. CopyrightPrismTech,2015 (Eventual) Leader election At the beginning of each epoch the leader is None Each new epoch a leader election algorithm is run M1 M2 M0 crashjoin join join epoch = 0 epoch = 1 epoch = 2 epoch = 3 Leader: None => M1 Leader: None => M1 Leader: None => M0 Leader: None => M0 At the beginning of each epoch the leader is None Each new epoch a leader election algorithm is run
  60. 60. CopyrightPrismTech,2015 An eventual leader election algorithm can be implemented by simply casting a vote each time there is an group epoch change A Group Epoch change takes place each time there is a change on the group view The leader is eventually elected only if a majority of the process currently on the view agree Otherwise the group leader is set to “None” (EventuaL) Leader Election algorithm object EventualLeaderElection { def main(args: Array[String]) { if (args.length < 2) { println("USAGE: GroupMember <gid> <mid>") sys.exit(1) } val gid = args(0).toInt val mid = args(1).toInt val group = Group(gid) group.join(mid) group listen { case EpochChange(e) => { val lid = group.view.min group.proposeLeader(mid, lid) } case NewLeader(l) => println(">> NewLeader = "+ l) } } }
  61. 61. CopyrightPrismTech,2015 To isolate the traffic generated by different groups, we use the group-id gid to name the partition in which all the group related traffic will take place segregating groups “1” “2” “3” DDS Domain Partition associated to the group with gid=2
  62. 62. CopyrightPrismTech,2015 Barriers
  63. 63. CopyrightPrismTech,2015 Barriers are a useful construct in parallel and distributed computing used to coordinate the phases of a distributed computation barrier abstraction Process Process Process
  64. 64. CopyrightPrismTech,2015 A Barrier abstraction should provide a way to assert the desired size along with waiting for it It is also useful to be able to list who is waiting on a given barrier Barrier Abstraction abstract class Barrier { def name: String def size: Int def watingList: List[Int] def wait(): Unit def wait(timeout: Duration): Unit }
  65. 65. CopyrightPrismTech,2015 To implement the Barrier abstraction it is sufficient to rely on the following topic types: Topic types struct Barrier { string name; long long epoch; short count; }; #pragma keylist Barrier name epoch struct BarredProcess { string name; long long epoch; long pid; }; #pragma keylist name epoch pid
  66. 66. CopyrightPrismTech,2015 P3 P2 P1
  67. 67. CopyrightPrismTech,2015 P3 P2 P1 Barrier = [(“Foo”, 1, 3)] BarredProcess = [(“Foo”, 1, 2)] BarredProcess = [(“Foo”, 1, 2), (“Foo”, 1, 1)] BarredProcess = [(“Foo”, 1, 2), (“Foo”, 1, 1), (“Foo”, 1, 3)] BarredProcess = []
  68. 68. CopyrightPrismTech,2015 P3 P2 P1 Barrier = [(“Foo”, 1, 3)] BarredProcess = [(“Foo”, 1, 1)] BarredProcess = [(“Foo”, 1, 1), (“Foo”, 1, 2)] BarredProcess = [(“Foo”, 1, 1), (“Foo”, 1, 2), (“Foo”, 1, 3)] BarredProcess = [] Barrier = [(“Foo”, 2, 3)] ...
  69. 69. CopyrightPrismTech,2015 Wrap-up
  70. 70. CopyrightPrismTech,2015 DDS provide a computation/coordination model inspired by tuple spaces. This is a symmetric and anonymous model of computation in which processes coordinate by reading and writing data in an eventual data space While amenable to very high performance implementations this abstraction is quite powerful and greatly ease in the development of distributed systems concluding remarks

×