High Performance Distributed Computing with DDS and Scala


Published on

The past few years have witnessed a tremendous increase in the amount of real-time data that applications in domains, such as, web analytics, social media, automated trading, smart grids, etc., have to deal with. The challenges faced by these applications, commonly called Big Data Applications, are manifold as the staggering growth in volumes is com- plicating the collection, storage, analysis and distribution of data. This paper focuses on the collection and distri- bution of data and introduces the Data Distribution Ser- vice for Real-Time Systems (DDS) an Object Management Group (OMG) standard for high performance data dissemi- nation used today in several Big Data applications. The pa- per then shows how the combination of DDS with functional programming languages, or at least incorporating functional features like the Scala programming language, makes a very natural and effective combination for dealing with Big Data applications.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

High Performance Distributed Computing with DDS and Scala

  1. 1. High Performance Distributed Computing with DDS and Scala Angelo Corsaro Chief Technology Officer PrismTech Corp. 4 rue Angiboust Marcoussis, France angelo@icorsaro.netABSTRACT Data Collect Store Analyse Distribute DataThe past few years have witnessed a tremendous increase inthe amount of real-time data that applications in domains,such as, web analytics, social media, automated trading, Figure 1: Stages characteristics of Big Data appli-smart grids, etc., have to deal with. The challenges faced by cations.these applications, commonly called Big Data Applications,are manifold as the staggering growth in volumes is com-plicating the collection, storage, analysis and distribution In this paper we introduce the DDS an OMG standard forof data. This paper focuses on the collection and distri- high performance data dissemination used today in severalbution of data and introduces the Data Distribution Ser- Big Data applications for data collection, dissemination asvice for Real-Time Systems (DDS) an Object Management well as for high performance data durability. We also showGroup (OMG) standard for high performance data dissemi- how the combination of DDS with functional programmingnation used today in several Big Data applications. The pa- languages, or at least programming languages incorporatingper then shows how the combination of DDS with functional functional features like Scala [7], makes it a very natural andprogramming languages, or at least incorporating functional effective combination for dealing with the challenges of Bigfeatures like the Scala programming language, makes a very Data.natural and effective combination for dealing with Big Data DDS [8] is an OMG Publish/Subscribe (P/S) standardapplications. that enables scalable, real-time, dependable and high perfor- mance data exchanges between publishers and subscribers.1. INTRODUCTION DDS addresses the needs of mission- and business-critical The past few years have witnessed a tremendous increase applications, such as, financial trading, air traffic controlin the amount of real-time data that applications in domains, and management, defense, aerospace, smart grids, and com-such as, web analytics, social media, automated trading, plex supervisory and telemetry systems. That key challengessmart grids, etc., have to deal with. The challenges faced by addressed by DDS are to provide a P/S technology in whichthese applications, commonly called Big Data Applications, data exchanged between publishers and subscribers are:are manifold as the staggering growth in volumes is com- • Real-time, meaning that the right information is de-plicating the collection, storage, analysis and distribution of livered at the right place at the right time–all thedata. time. Failing to deliver key information within the re- In this paper we focus our attention on the challenges tied quired deadlines can lead to life-, mission- or business-to the collection and distribution of the large volumes of threatening situations. For instance in a financial trad-data characteristic of Big Data applications – the first and ing 1ms can make the difference between losing or gain-last stage on the pipeline shown in Figure 1 – and partly ing $1M. Likewise, in a supervisory applications foron its storage. Big data applications have to be capable of power-grids, failing to meet deadlines under an over-collecting as well as distributing massive amounts of data, load situation could lead to severe blackout, such as themuch of which needs to be timely processed. As a result one experienced by the northeastern US and Canadathese applications need to optimally exploit networking as in 2003 [1].well as computing resources. • Dependable, ensuring availability, reliability, safety and integrity in spite of hardware and software failures. For instance, the lives of thousands of air travelers de-Permission to make digital or hard copies of all or part of this work for pend on the reliable functioning of an air traffic controlpersonal or classroom use is granted without fee provided that copies are and management system. These systems must ensurenot made or distributed for profit or commercial advantage and that copies 99.999% availability and ensure that critical data isbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specific delivered reliably, regardless of experienced failures.permission and/or a fee.DEBS 2012, July 16–20, 2012, Berlin, Germany. • High-performance, which necessitates the ability toCopyright 2012 ACM 978-1-4503-1315-5 ...$10.00. distribute very high volumes of data with very low la-
  2. 2. tencies. As an example, financial auto-trading appli- different vendor implementations. The DDSI Standard en- cations must handle millions of messages per second, sures on the wire interoperability across DDS implementa- each delivered relialy with minimal latency, e.g., on the tions from different vendors. The DDS-XTYPES standard order of tens of microseconds. extends the basic DDS type-system with polymorphic struc- tural types supporting extensibility and evolvability. These characteristics make DDS a perfect fit for Big Dataapplications.2. THE DATA DISTRIBUTION SERVICE Application Application P/S is a paradigm for one-to-many communication thatprovides anonymous, decoupled, and asynchronous commu- C/C++ C# Java Scalanication between producers of data – the publishers – andconsumers of data – the subscribers. This paradigm is at Data Centric Publish Subscriber (DCPS)the foundation of many technologies used today to develop Content Ownership Durability Subscriptionand integrate distributed applications (such as social appli- X-Typescation, e-services, financial trading, etc.), while ensure the Minimum Profilecomposed parts of the applications remain loosely coupledand independently evolvable. Different implementations of the P/S abstraction have DDS Interoperability Wire Protocol - DDSI-RTPSemerged to support the needs of different application do-mains. DDS [8] is an OMG P/S standard that enables scal- UDP/IP Applicationable, real-time, dependable and high performance data ex-changes between publishers and subscribers. DDS addressesthe needs of mission- and business-critical applications, such Figure 2: The DDS Standard.as, financial trading, air traffic control and management, de-fense, aerospace, smart grids, and complex supervisory and The DDS standard was formally adopted by the OMGtelemetry systems. That key challenges addressed by DDS in 2004. It quickly became the established P/S technol-are to provide a P/S technology in which data exchanged ogy for distributing high volumes of data dependably andbetween publishers and subscribers are: with predictable low latencies in applications such as radar • Real-time, meaning that the right information is de- processors, flying and land drones, combat management sys- livered at the right place at the right time–all the tems, air traffic control and management, high performance time. Failing to deliver key information within the re- telemetry, supervisory control and data acquisition systems, quired deadlines can lead to life-, mission- or business- and automated stocks and options trading. Along with wide threatening situations. For instance in a financial trad- commercial adoption, the DDS standard has been mandated ing 1ms can make the difference between losing or gain- as the technology for real-time data distribution by organi- ing $1M. Likewise, in a supervisory applications for zation worldwide, including the US Navy, the Department power-grids, failing to meet deadlines under an over- of Defence (DoD) Information-Technology Standards Reg- load situation could lead to severe blackout, such as the istry (DISR) the UK Mistry of Defence (MoD), the Mili- one experienced by the northeastern US and Canada tary Vehicle Association (MILVA), and EUROCAE–the Eu- in 2003 [1]. ropean organization that regulates standards in Air Traffic Control and Management. • Dependable, thus ensuring availability, reliability, safety and integrity in spite of hardware and software failures. 2.2 Key DDS Concepts and Entities For instance, the lives of thousands of air travelers de- Below we summarize the key architectural concepts and pend on the reliable functioning of an air traffic control entities in DDS. and management system. These systems must ensure 99.999% availability and ensure that critical data is 2.2.1 Global data space delivered reliably, regardless of experienced failures. The key abstraction at the foundation of DDS is a fully • High-performance, which necessitates the ability to distributed Global Data Space (GDS) (see Figure 3). The distribute very high volumes of data with very low la- DDS specification requires a fully distributed implementa- tencies. As an example, financial auto-trading appli- tion of the GDS to avoid single points of failure or single cations must handle millions of messages per second, points of contention. Publishers and Subscribers can join each delivered relialy with minimal latency, e.g., on the or leave the GDS at any point in time as they are dynam- order of tens of microseconds. ically discovered. The dynamic discovery of Publisher and Subscribers is performed by the GDS and does not rely on2.1 DDS Standard Organization any kind of centralized registry, such as those found in other The key members of the OMG DDS standards family are P/S technologies like the Java Message Service (JMS) [6].shown in Figure 2 and consist of the DDS v1.2 API [8] and The GDS also discovers application defined data types andthe Data Distribution Service Interoperability Wire Proto- propagates them as part of the discovery process.col (DDSI) [9] and the DDS Extensible and Dynamic Topic Since DDS provides a GDS equipped with dynamic discov-Types (DDS-XTYPES) [10] specification. The DDS API ery, there is no need for applications to configure anythingstandard defines the key DDS abstractions and their seman- explicitly when a system is deployed. Applications will betics, this standard also ensures source code portability across automatically discovered and data will begin to flow. More-
  3. 3. producers for each data-stream. Figure 4 provides a visual Data Reader Data Writer Data Reader Data TopicD id =701 Writer TopicA struct TempSensor { Data id =809 @key long id; TopicB Reader Topic float temp; float hum; Data }; Writer id =977 TopicC ... Data Instances Instances Data Writer Reader Figure 4: Topic Instances and Data Streams. Figure 3: The DDS Global Data Space. representation of the relationship existing between a topic,over, since the GDS is fully distributed the crash of one node its instances, and the associated data streams.will not induce unknown consequences on the system avail-ability, i.e., in DDS there is no single point of failure and the Topic QoS..system as a whole will continue to run even if applications The Topic QoS provides a mechanism to express relevantcrash/restart or connect/disconnect. non-functional properties of a topic. Section 4 presents a detailed description of the DDS QoS model, but at this point2.2.2 Topic we simply mention the ability of defining QoS for topics to The information that populates the GDS is defined by capture the key non-functional invariant of the system andmeans of DDS Topics. A topic, identified by a unique name, make them explicit and visible.a data type, and a collection of Quality of Service (QoS)policies, defines conceptually a class of data streams. The Content filters..unique name provides a mean of uniquely referring to given DDS supports defining content filters over a specific topic.topics, the data type defines the type of the stream of data, These content filters are defined by instantiating a Content-and the QoS captures all the non-functional aspect of the FilteredTopic for an existing topic and providing a filterinformation, such as its temporal properties or its availabil- expression. The filter expression follows the same syntax ofity. the WHERE clause on a SQL statement and can operate on any of the topic type attributes. For instance, a filterTopic types.. expression for a temperature sensor topic could be "id = DDS Topics can be specified using several different syn- 101 AND (temp > 35 OR hum > 65)".taxes, such as Interface Definition Language (IDL), eXten-sible Markup Langauge (XML), Unified Modeling Langauge 2.2.3 DataWriters and DataReaders(UML), and annotated Java. For instance, Listing 1 shows Since a topic defines the subjects produced and consumed,a type declaration for an hypothetical temperature sensor DDS provides two abstractions for writing and reading thesetopic-type. Some of the attributes of a topic-type can be topics: DataWriterss and DataReader s, respectively. Bothmarked as representing the key of the type. DataReaders and DataWriters are strongly typed and are defined for a specific topic and topic type.Listing 1: Topic type declaration for an hypotheticaltemperature sensor 2.2.4 Publishers, Subscribers and Partitions 1 struct TempSensorType { DDS also defines Publishers and Subscribers, which group @Key together sets of DataReaders and DataWriters and perform 3 long id; some coordinated actions over them, as well as manage the float temp; communication session. DDS also supports the concept of a 5 float hum; Partition which can be used to organize data flows within a }; GDS to decompose un-related sets of topics.The key allows DDS to define the selector that identifies a 2.3 Example of applying DDSspecific data stream, commonly called a topic instance. For Now that we presented the key architectural concepts andinstance, the topic type declaration in Listing 1 defines the entities in DDS, we will show the anatomy of a simple DDSid attributes as being the key of the type. Each unique application in Scala using the Escalier DDS API [4]. List-id value therefore identifies a specific data stream (topic ing 2 shows the steps required to write a sample for theinstance) for which DDS will manage the entire life-cycle. temperature sensor topic describe earlier. In this specificThis allows applications to (1) identify the specific source example, since no domain is explicitly joined, the runtimeof data, such as the specific physical sensor whose id=5, will assume that the application is interested in joining theand (2) learn about relevant life-cycle events such as the the default domain.creation and disposal of streams. In addition, DDS providesapplications with information concerning the existence of
  4. 4. Listing 2: A simple DDS writer. critically important, DDS adopted a strongly, and somewhat statically, typed system to define type properties of a topic. object TempSensorWriter { 2 def main(args: Array[String]) { This section provides an overview of the DDS type system. 4 // Define a topic 3.1 Structural Polymorphic Types System val topic = DDS provides a polymorphic structural type system [3, 2, 6 Topic[TempSensor]("TTempSensor") 10], which means that the type system not only supports polymorphism, but also bases its sub-typing on the structure 8 // Create a Writer val writer = of a type, as opposed than its name. For example, consider 10 DataWriter[TempSensor](topic) the types declared in Listing 4. 12 // Create a Sample Listing 4: Nominal vs. Structural Subtypes val ts = 14 new TempSensor(0, 25, 0.6F) 1 struct Coord2D { long x; 16 // Write the sample 3 long y; writer ! ts }; 5 18 } } struct Coord3d : Coord2D { 7 long z; };Note that no QoS has been specified for any of the DDS 9entities defined in this code fragment, so the behavior will struct Coord {be the default QoS. 11 long x; long y; 13 long z;Listing 3: A simple DDS reader. }; 1 object SensorLogger { def main(args: Array[String]) { In a nominal polymorphic type system, the Coord3D would 3 be a subtype of Coord2D, which would be expressed by // Define a topic writing Coord3D <: Coord2D. In a nominal type sys- 5 val topic = Topic[TempSensor]("TTempSensor") tem, however, there would be no relationship between the 7 Coord2D/Coord3D with the type Coord. // Create a Reader Conversely, in a polymorphic structural type system like 9 val reader = the one used by DDS the type Coord is a subtype of the type DataReader[TempSensor](topic) Coord2D—thus Coord <: Coord2D and it is structurally 11 the same type as the type Coord3D. // Register a reaction for printing data The main advantage of a polymorphic structural type sys- 13 reader.reactions += { tem over nominal type systems is that the former considers case DataAvailable(dr) => { the structure of the type as opposed to the name to de- 15 (reader read) foreach (println) termine sub-types relationships. As a result, polymorphic } structural types systems are more flexible and especially 17 } well-suited for evolvable and large-scale distributed systems. } 19 } In particular, distributed systems often need to evolve incre- mentally to provide a new functionality to most subsystems and systems, without requiring a complete upgrade and re-Listing 3 shows the steps required to read the data samples deployment.published on a given topic. The code fragment shows the In the example above the type Coord was a monotonicapplication registering a reaction with the DDS data reader extension of the type Coord2D since it was added some newby providing a partially specified function. The partially attributes “at the end.” The DDS type system can handledefined function uses pattern matching to select the event both attribute reording and attribute removal with equalin which it is interested. Specifically, the code in Listing 3 alacrity.registers a reaction that will print received data on the con-sole. It is worth noticing how the elegance and simplicity 3.2 Type annotationsof this simple application comes from a combination of the The DDS type systems supports an annotation systemdata-centric DDS paradigm and the use of Scala functional very similar to that available in Java [5]. It also defines a setfeatures such as partial and higher order functions. of built-in annotations that can be used to control the exten- sibility of a type, as well as the properties of the attributes3. THE DDS TYPE SYSTEM of a given type. Some important built-in annotations are Strong typing plays a key role in developing software sys- described below:tems that are efficient, easier to extend, less expensive to • The @ID annotation can be used to assign a globalmaintain, and in which runtime errors are limited to those unique ID to the data members of a type. This ID iscomputational aspects that either cannot be decided at compile- used to deal efficiently with attributes reordering.time or that cannot be detected at compile-time by the typesystem. Since DDS targets mission- and business-critical • The @Key annotation can be used to identify the typesystems where safety, extensibility, and maintainability are members that constitute the key of the topic type.
  5. 5. DURABILITY LIVELINESS DEST. ORDER TIME-BASED FILTER DDS subscriptions are matched against the topic type and HISTORY OWENERSHIP PARTITION RESOURCE LIMITS name, as well as against the QoS being offered/requested by LIFESPAN OWN. STRENGTH PRESENTATION data writers and readers. This DDS matching mechanism RELIABILITY DW LIFECYCLE ensures that (1) types are preserved end-to-end due to the DEADLINE topic type matching and (2) end-to-end QoS invariants are USER DATA DR LIFECYCLE LATENCY BUDGET also preserved. TOPIC DATA ENTITY FACTORY GROUP DATA The reminder of this section describes the most important TRANSPORT PRIO QoS policies in DDS. RxO QoS Local QoS 4.1 Data Availability DDS provides the following QoS policies that control the Figure 5: DDS QoS Policies. availability of data to domain participants: • The DURABILITY QoS policy controls the lifetime of • The @Optional annotations can be used to express the data written to the global data space in a DDS do- that an attribute is optional and might not be set main. Supported durability levels are (1) VOLATILE, by the sender. DDS provides specific accessors for which specifies that once data is published it is not optional attributes that can be used to safely check maintained by DDS for delivery to late joining appli- whether the attribute is set or not. In addition, to save cations, (2) TRANSIENT_LOCAL, which specifies that bandwidth, DDS will not send optional attributes for publishers store data locally so that late joining sub- which a value has not been provided. scribers get the last published item if a publisher is still alive, (3) TRANSIENT, which ensures that the GDS • The @Shared annotation can be used to specify that maintains the information outside the local scope of the attribute must be referenced through a pointer. any publishers for use by late joining subscribers, and This annotations helps avoid situations when large data- (4) PERSISTENT, which ensures that the GDS stores structures (such as images or large arrays) would be the information persistently so to make it available to allocated contiguously with the topic type, which may late joiners even after the shutdown and restart of the be undesirable in resource-contrained embedded sys- whole system. Durability is achieved by relying on tems. a durability service whose properties are configured by means of the DURABILITY_SERVICE QoS of non- • The @Extensibility annotation can be used to con- volatile topics. trol the level of extensibility allowed for a given topic type. The possible values for this annotation are (1) • The LIFESPAN QoS policy controls the interval of time Final to express the fact that the type is sealed and during which a data sample is valid. The default value cannot be evolved – as a result this type cannot be is infinite, with alternative values being the time-span substituted by any other type that is structurally its for which the data can be considered valid. subtype, (2) Extensible to express that only mono- tonic extensibility should be considered for a type, and • The HISTORY QoS policy controls the number of data (3) Mutable to express that the most generic struc- samples (i.e., subsequent writes of the same topic) that tural subtype rules for a type should be applied when must be stored for readers or writers. Possible values considering subtypes. are the last sample, the last n samples, or all samples. In summary, the DDS type system provides all the advan-tages of a strongly-typed system, together with the flexibil- The data availability QoS policies decouple applications inity of structural type systems. This combinations supports time and space. They also enable these applications to coop-the key requirements typical of a large class of distributed erate in highly dynamic environments characterized by con-systems since it preserves types end-to-end, while providing tinuous joining and leaving of publisher/subscribers. Suchtype-safe extensibility and incremental system evolution and properties are particularly relevant for distributed systemshigh performance! since they increase the decoupling of the component parts. 4.2 Data Delivery4. THE DDS QOS MODEL DDS provides the following QoS policies that control how DDS relies on QoS policies to provides applications with data is delivered and how publishers can claim exclusiveexplicit control over a wide set of non-functional properties, rights on data updates:such as data availability, data delivery, data timeliness andresource usage – Figure 5 shows the full list of available • The PRESENTATION QoS policy gives control on howQoS. Each QoS policy might be applicable to one or more changes to the information model are presented to sub-DDS entities, such as, topics, data readers, and data writers. scribers. This QoS gives control on the ordering as wellPolicies controlling end-to-end properties are considered as as the coherency of data updates. The scope at whichpart of the subscription’s matching. it is applied is defined by the access scope, which can DDS uses a request vs. offered QoS matching approach in be one of INSTANCE, TOPIC, or GROUP level.which a data reader matches a data writer if and only if theQoS it is requesting for the given topic does not exceed (e.g., • The RELIABILITY QoS policy controls the level of re-is no more stringent than) the QoS with which the data is liability associated with data diffusion. Possible choicesproduced by the data writer. are RELIABLE and BEST_EFFORT distribution.
  6. 6. • The PARTITION QoS policy gives control over the as- such as Rate Monotonic Analysis, the DEADLINE QoS sociation between DDS partitions (represented by a policy represents the natural deadline of information string name) and a specific instance of a publisher/sub- and is used by DDS to notify violations, finally the scriber. This association provides DDS implementa- LATENCY_BUDGET is used to optimize the resource uti- tions with an abstraction that allow to segregate traf- lization in the system. fic generated by different partitions, thereby improving overall system scalability and performance. These DDS data timeliness QoS policies provide control over the temporal properties of data. Such properties are partic- • The DESTINATION_ORDER QoS policy controls the or- ularly relevant since they can be used to define and control der of changes made by publishers to some instance the temporal aspects of various subsystem data exchanges, of a given topic. DDS allows the ordering of different while ensuring that bandwidth is exploited optimally. changes according to source or destination timestamps. 4.4 Resources • The OWNERSHIP QoS policy controls whether it is al- DDS defines the following QoS policies to control the net- lowed for multiple data writers to concurrently up- work and computing resources that are essential to meet date a given topic instance. When set to EXCLUSIVE, data dissemination requirements: this policy ensures that only one among active data writers–namely the one with the highest OWNERSHIP- • The TIME_BASED_FILTER QoS policy allows appli- _STRENGTH–will change the value of a topic instance. cations to specify the minimum inter-arrival time be- The other writers, those with lower strength, are still tween data samples, thereby expressing their capabil- able to write, yet their updates will not have an impact ity to consume information at a maximum rate. Sam- visible on the distributed system. In case of failure of ples that are produced at a faster pace are not deliv- the highest strength data writer, DDS automatically ered. This policy helps a DDS implementation op- switches to the next among the remaining data writ- timize network bandwidth, memory, and processing ers. power for subscribers that are connected over limited bandwidth networks or which have limited computingThe data delivery QoS policies control the reliability and capabilities.availability of data, thereby allowing the delivery of the rightdata to the right place at the right time. More elaborate • The RESOURCE_LIMITS QoS policy allows applica-ways of selecting the right data are offered by the DDS tions to control the maximum available storage to holdcontent-awareness profile, which allows applications to se- topic instances and related number of historical sam-lect information of interest based upon their content. These ples DDS’s QoS policies support the various elementsQoS policies are particularly useful since they can be used to and operating scenarios that constitute net-centric mis-finely tune how—and to whom—data is delivered, thus lim- sion critical information management. By controllingiting not only the amount of resources used, but also mini- these QoS policies it is possible to scale DDS frommizing the level of interference by independent data streams. low-end embedded systems connected with narrow and noisy radio links, to high-end servers connected to high-4.3 Data timeliness speed fiber-optic networks. DDS provides the following QoS policies to control thetimeliness properties of distributed data: These DDS resource QoS policies provide control over the lo- cal and end-to-end resources, such as memory and network • The DEADLINE QoS policy allows applications to de- bandwidth. Such properties are particularly useful in en- fine the maximum inter-arrival time for data. DDS vironments characterized by largely heterogeneous subsys- can be configured to automatically notify applications tems, devices, and network connections that often require when deadlines are missed. down-sampling, as well as overall controlled limit on the amount of resources used. • The LATENCY_BUDGET QoS policy provides a means for applications to inform DDS how long time the mid- 4.5 Configuration dleware can take in order to make data available to The QoS policies described above, provide control over the subscribers. When set to zero, DDS sends the data most important aspects of data delivery, availability, timeli- right away, otherwise it uses the specified interval to ness, and resource usage. DDS also supports the definition exploit temporal locality and batch data into bigger and distribution of user specified bootstrapping information messages so to optimize bandwidth, CPU and battery via the following QoS policies: usage. • The TRANSPORT_PRIORITY QoS policy allows appli- • The USER_DATA QoS policy allows applications to as- cations to control the priority associated with the data sociate a sequence of octets to domain participant, flowing on the network. This priority is used by DDS data readers and data writers. This data is then dis- to prioritize more important data relative to less im- tributed by means of a built-in topic—which are topics portant data. pre-defined by the DDS standard and used for inter- nal purposes. This QoS policy is commonly used to The DEADLINE, LATENCY_BUDGET and TRANSPORT- distribute security credentials. _PRIORITY QoS policy provide the controls necessary to build priority pre-emptive distributed real-time sys- • The TOPIC_DATA QoS policy allows applications to tems. In these systems, the TRANSPORT_PRIORITY associate a sequence of octet with a topic. This boot- is derived from a static priority scheduling analysis, strapping information is distributed by means of a
  7. 7. built-in topic. A common use of this QoS policy is 5.1 Last n-Values Channel to extend topics with additional information, or meta- To obtain a Last n-Values Channel the HISTORY QoS information, such as IDL type-codes or XML schemas. policy has to be set to KEEP_LAST for a depth n. In this case we are guaranteed that the data reader will be receiv- • The GROUP_DATA QoS policy allows applications to ing the last n-samples produced by the data writer. The associate a sequence of octets with publishers and sub- DURABILITY QoS can be used in this case to decouple the scribers – this bootstrapping information is distributed production of data by the data writer with the consumption by means built-in topics. A typical use of this infor- of data by the data reader. The Last n-Values Channel is mation is to allow additional application control over useful to represent distributed state. For instance in several subscriptions matching. Big Data analytics application, when performing the analyse phase (see Figure 1) what matters are the current values, orThese DDS configuration QoS policies provide useful a mech- perhaps the last n values, of the relevant indexes as opposedanism for bootstrapping and configuring applications in a to the the full set of changes since the last analyse phase.distributed systems. Last n-Values Channels should always be used when mod- elling distributed state as they allow to minimize the use of computational, storage and network resources.5. DDS CACHES AND CHANNELS Each DDS data writer and data reader has an associated 5.2 FIFO Channeldata cache. The data writer cache stores a subset of the To obtain a FIFO Channel the HISTORY QoS policy hassamples written. The data reader cache stores a subset of to be set to KEEP_ALL. In this case we are guaranteed thatthe samples produced by its matching data writers. The the data reader will be receiving all the samples produced byexact behaviour of the cache, w.r.t. the samples it will store, the matching data writers. The DURABILITY QoS can beis controlled through DDS QoS policies. used to decouple the production of data, by the data writer, One way of abstracting data readers and data writers with the consumption of data, by the data reader, but be-caches and their semantics under different QoS policies, along ware that for TRANSIENT and PERSISTENT DURABILITYwith data distribution, is to reason in terms of channels. the set of samples stored by the durability service is con-A DDS data writer and its matching data readers can be trolled through a specific policy, thus this QoS does not nec-thought of as connected by a logical typed communication essarily imply that all samples have to be persisted.channel (see Figure 6). DDS’ QoS policies define the prop- The FIFO Channel is useful to represent queues as well aserties controlling the set of samples, produced by the data distributed events. This kind of channel is also very usefulwriter, that will be eventually received by the matching data when implementing distributed algorithms such as mutualreaders. exclusion, or distributed queues over DDS. At the two extreme, DDS provides support for Last n-Values Channel and for FIFO Channel. In order to reason 5.3 DDS Read and Takeabout the samples that a reader might receive we will be To complete the picture, along with describing the two keyconsidering the RELIABILITY QoS as set to RELIABLE. channel semantics provided by DDS we need to explain howIf that was not the case, the only guarantee that can be received data can be retrieved by the application. To eachprovided to a matching data reader is that it will receive a DDS data reader is associated a local cache where the datasubset of the data produced by the matching data writer. received by the matching data writers is stored. The appli-This subset will in general be different for each matching cation can read the content of the cache using both state asreader. well as content selectors in which case the samples remain In the remainder of this section we will investigate the QoS in the cache and are simply marked as read, i.e. the state ofpolicies that provide control over the communication chan- the samples is changed. Otherwise, the application can takenel and will describe the semantics associated with these the samples from the cache, in which case the samples arechannels. As a final remark it is worth pointing out that removed by the cache.these channels are logical and fully distributed. In general, the read operation is used in combination with Last n-Values Channel, while the take operation is used with the FIFO Channels. It is a common beginners programming error to use read with FIFO Channels with the result that DR the resource usage will keep growing – unless the user had specified some resource limits. DW Topic DR 5.4 Selectors As mentioned before, DDS provides both content as well as state selectors to specify the properties of the samples that should be read. DR 5.4.1 Content SelectorsFigure 6: The logical channel between a DDS data Content Filters.writer and its matching readers One of the mechanisms provided by DDS to select the content that will be received by a data reader are Content Filtered Topics. These kind of topics are created by associ-
  8. 8. ating query expression with an existing topic (see Listing 5). It is worth noticing how the strongly typed nature of DDSThe query expression specify the filter that will be evaluated perfectly mixes with the Scala type system producing ele-against the samples produced by the matching writers to de- gant, compact and very efficient code.termine whether the sample is relevant for the reader or not.In the example shown in Listing 5, only the samples whose Streams/Instances.temperature is greater than 25 or whose humidity is higher Often applications need to access the samples belongingthan 75% that will be seen by the data reader. to a specific stream or topic instance. One way to achieve this is to query for a specific key value. A better way is toListing 5: Content Filtered Topic. select a stream by using an instance handle – the DDS spe- 1 val topic = cific way of identifying a stream or topic instance. Listing 9 Topic[TempSensor]("TTempSensor") shows how the data belonging to a specific stream can be 3 val query = programmatically selected. Query("temp > %0 || hum > %1", List(25, 0.75F) Listing 9: Query Selector. 5 val ftopic = ContentFilteredTopic[TempSensor](" val handle = reader lookup (key) CFTempSensor",topic, query) 2 val data = 7 val reader = (reader instance(handle)) read DataReader[TempSensor](ftopic) 5.4.2 State SelectorsQueries. In addition to content-based selectors, DDS also provides DDS also provide a mechanism for executing queries over state selectors which allow to select samples based on theirthe content of the data reader cache (see Listing 6). The state. Specifically, DDS state selectors allow to select sam-main difference between queries and content filtered topics ples with respect to three different states, the READ_STATEis that the former allows to select samples of the cache based the VIEW_STATE and the INSTANCE_STATE. The READ-on their content, while the latter controls the samples that _STATE allows to distinguish between new samples and sam-get into the cache based on their content. ples that have already been read. The VIEW_STATE allows to distinguish between samples belonging to a new instanceListing 6: Query Selector. as opposed to samples belonging to an instance that were 1 val data = already known. Finally the INSTANCE_STATE allows to dis- (reader filterContent(query)) read tinguish between samples belonging to an instance for which there are still active writers as opposed to a stream for which there are no more writers. The Listing 10 shows an example of state-based selector.List Comprehension. Another way of manipulating the data available on the Listing 10: Instance Selector.DDS cache is through list comprehension. In this case, thehigher order functions defined as part of the Scala List class 1 val data =can be used to perform arbitrary operations on the items (reader 3 filterState(SampleState.NewData)available on the cache. As an example, assuming that we ) readwanted to compute the moving average of the temperatureover the last n samples, we could have used a DDS datareader with history depth set to n and then simply com- Finally, it is worth mentioning that the content and stateputed the moving average using the List fold right function selectors can be mixed and matched at will to perform com-as shown in Listing 7. plex selection operations as shown in the example of List- ing 11. Additionally, selectors can be used with read as wellListing 7: List Comprehensions in DDS with fold- as with take operations.right. Listing 11: Example of Selectors composition. val data = 2 reader read val data = val mavg = 2 (reader 4 (data : 0) (_ + _) / (data size) 4 instance(handle)Another way to achieve the same goal is to first map the filterContent(query) 6 filterState(SampleState.NewData)list of TempSensorType to a list of temperatures and the ) readcompute the moving average. This approach is shown inFigure 8 Listing 11 is showing an examples in which the data selectedListing 8: List Comprehensions in DDS with map. is that belonging to a specific stream for which a query holds, and that have never been read before. val data= Obviously, list comprehension can be used for operating 2 (reader read) map (_.temp) on samples content and state. Differently from DDS built-in val mavg = 4 (data sum) / (data size) selectors, list comprehension operates on the samples read or taken, as opposed to being used for selecting the samples
  9. 9. to read or take from the cache. Thus when using list com-prehension one should always be careful in selecting first theset of samples from the cache.6. CONCLUDING REMARKS This paper has introduced the key concepts at the foun-dation of the OMG DDS and has shown how it combinationwith programming languages exposing functional features,such as Scala, makes a very natural and elegant combina-tion. The DDS standard is used today a large number of mission-and business- critical systems and is becoming more andmore relevant to Big Data applications due to its perfor-mance its scalability and extensibility.7. REFERENCES [1] http://bit.ly/nebout03, 2003. [2] L. Cardelli. Type systems. ACM Comput. Surv., 28:263–264, 1996. [3] L. Cardelli and P. Wegner. On understanding types, data abstraction, and polymorphism. ACM Computing Surveys, 17(4):471–522, 1985. [4] A. Corsaro. https://github.com/kydos/escalier, 2011. [5] J. Gosling, B. Joy, G. Steele, and G. Bracha. Java(TM) Language Specification, The (3rd Edition) (Java (Addison-Wesley)). Addison-Wesley Professional, 2005. [6] S. Microsystems. The java message service specification v1.1, 2002. [7] M. Odersky, L. Spoon, and B. Venners. Programming in Scala. Artima Inc, 2 edition, 2011. [8] OMG. Data distribution service for real-time systems v1.2, 2007. [9] OMG. Data distribution service interoperability wire protocol v2.1, 2008.[10] OMG. Dynamic and extensible topic types, 2010.