The past few years have witnessed a tremendous increase in the amount of real-time data that applications in domains, such as, web analytics, social media, automated trading, smart grids, etc., have to deal with. The challenges faced by these applications, commonly called Big Data Applications, are manifold as the staggering growth in volumes is com- plicating the collection, storage, analysis and distribution of data. This paper focuses on the collection and distri- bution of data and introduces the Data Distribution Ser- vice for Real-Time Systems (DDS) an Object Management Group (OMG) standard for high performance data dissemi- nation used today in several Big Data applications. The pa- per then shows how the combination of DDS with functional programming languages, or at least incorporating functional features like the Scala programming language, makes a very natural and effective combination for dealing with Big Data applications.
High Performance Distributed Computing with DDS and Scala
1. High Performance Distributed Computing
with DDS and Scala
Angelo Corsaro
Chief Technology Officer
PrismTech Corp.
4 rue Angiboust
Marcoussis, France
angelo@icorsaro.net
ABSTRACT
Data Collect Store Analyse Distribute Data
The past few years have witnessed a tremendous increase in
the amount of real-time data that applications in domains,
such as, web analytics, social media, automated trading,
Figure 1: Stages characteristics of Big Data appli-
smart grids, etc., have to deal with. The challenges faced by
cations.
these applications, commonly called Big Data Applications,
are manifold as the staggering growth in volumes is com-
plicating the collection, storage, analysis and distribution
In this paper we introduce the DDS an OMG standard for
of data. This paper focuses on the collection and distri-
high performance data dissemination used today in several
bution of data and introduces the Data Distribution Ser-
Big Data applications for data collection, dissemination as
vice for Real-Time Systems (DDS) an Object Management
well as for high performance data durability. We also show
Group (OMG) standard for high performance data dissemi-
how the combination of DDS with functional programming
nation used today in several Big Data applications. The pa-
languages, or at least programming languages incorporating
per then shows how the combination of DDS with functional
functional features like Scala [7], makes it a very natural and
programming languages, or at least incorporating functional
effective combination for dealing with the challenges of Big
features like the Scala programming language, makes a very
Data.
natural and effective combination for dealing with Big Data
DDS [8] is an OMG Publish/Subscribe (P/S) standard
applications.
that enables scalable, real-time, dependable and high perfor-
mance data exchanges between publishers and subscribers.
1. INTRODUCTION DDS addresses the needs of mission- and business-critical
The past few years have witnessed a tremendous increase applications, such as, financial trading, air traffic control
in the amount of real-time data that applications in domains, and management, defense, aerospace, smart grids, and com-
such as, web analytics, social media, automated trading, plex supervisory and telemetry systems. That key challenges
smart grids, etc., have to deal with. The challenges faced by addressed by DDS are to provide a P/S technology in which
these applications, commonly called Big Data Applications, data exchanged between publishers and subscribers are:
are manifold as the staggering growth in volumes is com- • Real-time, meaning that the right information is de-
plicating the collection, storage, analysis and distribution of livered at the right place at the right time–all the
data. time. Failing to deliver key information within the re-
In this paper we focus our attention on the challenges tied quired deadlines can lead to life-, mission- or business-
to the collection and distribution of the large volumes of threatening situations. For instance in a financial trad-
data characteristic of Big Data applications – the first and ing 1ms can make the difference between losing or gain-
last stage on the pipeline shown in Figure 1 – and partly ing $1M. Likewise, in a supervisory applications for
on its storage. Big data applications have to be capable of power-grids, failing to meet deadlines under an over-
collecting as well as distributing massive amounts of data, load situation could lead to severe blackout, such as the
much of which needs to be timely processed. As a result one experienced by the northeastern US and Canada
these applications need to optimally exploit networking as in 2003 [1].
well as computing resources.
• Dependable, ensuring availability, reliability, safety
and integrity in spite of hardware and software failures.
For instance, the lives of thousands of air travelers de-
Permission to make digital or hard copies of all or part of this work for pend on the reliable functioning of an air traffic control
personal or classroom use is granted without fee provided that copies are and management system. These systems must ensure
not made or distributed for profit or commercial advantage and that copies 99.999% availability and ensure that critical data is
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific delivered reliably, regardless of experienced failures.
permission and/or a fee.
DEBS 2012, July 16–20, 2012, Berlin, Germany. • High-performance, which necessitates the ability to
Copyright 2012 ACM 978-1-4503-1315-5 ...$10.00. distribute very high volumes of data with very low la-
2. tencies. As an example, financial auto-trading appli- different vendor implementations. The DDSI Standard en-
cations must handle millions of messages per second, sures on the wire interoperability across DDS implementa-
each delivered relialy with minimal latency, e.g., on the tions from different vendors. The DDS-XTYPES standard
order of tens of microseconds. extends the basic DDS type-system with polymorphic struc-
tural types supporting extensibility and evolvability.
These characteristics make DDS a perfect fit for Big Data
applications.
2. THE DATA DISTRIBUTION SERVICE Application
Application
P/S is a paradigm for one-to-many communication that
provides anonymous, decoupled, and asynchronous commu- C/C++ C# Java Scala
nication between producers of data – the publishers – and
consumers of data – the subscribers. This paradigm is at Data Centric Publish Subscriber (DCPS)
the foundation of many technologies used today to develop Content
Ownership Durability
Subscription
and integrate distributed applications (such as social appli-
X-Types
cation, e-services, financial trading, etc.), while ensure the Minimum Profile
composed parts of the applications remain loosely coupled
and independently evolvable.
Different implementations of the P/S abstraction have DDS Interoperability Wire Protocol - DDSI-RTPS
emerged to support the needs of different application do-
mains. DDS [8] is an OMG P/S standard that enables scal-
UDP/IP
Application
able, real-time, dependable and high performance data ex-
changes between publishers and subscribers. DDS addresses
the needs of mission- and business-critical applications, such Figure 2: The DDS Standard.
as, financial trading, air traffic control and management, de-
fense, aerospace, smart grids, and complex supervisory and
The DDS standard was formally adopted by the OMG
telemetry systems. That key challenges addressed by DDS
in 2004. It quickly became the established P/S technol-
are to provide a P/S technology in which data exchanged
ogy for distributing high volumes of data dependably and
between publishers and subscribers are:
with predictable low latencies in applications such as radar
• Real-time, meaning that the right information is de- processors, flying and land drones, combat management sys-
livered at the right place at the right time–all the tems, air traffic control and management, high performance
time. Failing to deliver key information within the re- telemetry, supervisory control and data acquisition systems,
quired deadlines can lead to life-, mission- or business- and automated stocks and options trading. Along with wide
threatening situations. For instance in a financial trad- commercial adoption, the DDS standard has been mandated
ing 1ms can make the difference between losing or gain- as the technology for real-time data distribution by organi-
ing $1M. Likewise, in a supervisory applications for zation worldwide, including the US Navy, the Department
power-grids, failing to meet deadlines under an over- of Defence (DoD) Information-Technology Standards Reg-
load situation could lead to severe blackout, such as the istry (DISR) the UK Mistry of Defence (MoD), the Mili-
one experienced by the northeastern US and Canada tary Vehicle Association (MILVA), and EUROCAE–the Eu-
in 2003 [1]. ropean organization that regulates standards in Air Traffic
Control and Management.
• Dependable, thus ensuring availability, reliability, safety
and integrity in spite of hardware and software failures. 2.2 Key DDS Concepts and Entities
For instance, the lives of thousands of air travelers de-
Below we summarize the key architectural concepts and
pend on the reliable functioning of an air traffic control
entities in DDS.
and management system. These systems must ensure
99.999% availability and ensure that critical data is 2.2.1 Global data space
delivered reliably, regardless of experienced failures.
The key abstraction at the foundation of DDS is a fully
• High-performance, which necessitates the ability to distributed Global Data Space (GDS) (see Figure 3). The
distribute very high volumes of data with very low la- DDS specification requires a fully distributed implementa-
tencies. As an example, financial auto-trading appli- tion of the GDS to avoid single points of failure or single
cations must handle millions of messages per second, points of contention. Publishers and Subscribers can join
each delivered relialy with minimal latency, e.g., on the or leave the GDS at any point in time as they are dynam-
order of tens of microseconds. ically discovered. The dynamic discovery of Publisher and
Subscribers is performed by the GDS and does not rely on
2.1 DDS Standard Organization any kind of centralized registry, such as those found in other
The key members of the OMG DDS standards family are P/S technologies like the Java Message Service (JMS) [6].
shown in Figure 2 and consist of the DDS v1.2 API [8] and The GDS also discovers application defined data types and
the Data Distribution Service Interoperability Wire Proto- propagates them as part of the discovery process.
col (DDSI) [9] and the DDS Extensible and Dynamic Topic Since DDS provides a GDS equipped with dynamic discov-
Types (DDS-XTYPES) [10] specification. The DDS API ery, there is no need for applications to configure anything
standard defines the key DDS abstractions and their seman- explicitly when a system is deployed. Applications will be
tics, this standard also ensures source code portability across automatically discovered and data will begin to flow. More-
3. producers for each data-stream. Figure 4 provides a visual
Data
Reader
Data
Writer
Data
Reader
Data TopicD id =701
Writer
TopicA
struct TempSensor {
Data id =809 @key long id;
TopicB Reader Topic float temp;
float hum;
Data };
Writer id =977
TopicC
...
Data Instances Instances
Data
Writer Reader
Figure 4: Topic Instances and Data Streams.
Figure 3: The DDS Global Data Space.
representation of the relationship existing between a topic,
over, since the GDS is fully distributed the crash of one node its instances, and the associated data streams.
will not induce unknown consequences on the system avail-
ability, i.e., in DDS there is no single point of failure and the Topic QoS..
system as a whole will continue to run even if applications The Topic QoS provides a mechanism to express relevant
crash/restart or connect/disconnect. non-functional properties of a topic. Section 4 presents a
detailed description of the DDS QoS model, but at this point
2.2.2 Topic we simply mention the ability of defining QoS for topics to
The information that populates the GDS is defined by capture the key non-functional invariant of the system and
means of DDS Topics. A topic, identified by a unique name, make them explicit and visible.
a data type, and a collection of Quality of Service (QoS)
policies, defines conceptually a class of data streams. The Content filters..
unique name provides a mean of uniquely referring to given DDS supports defining content filters over a specific topic.
topics, the data type defines the type of the stream of data, These content filters are defined by instantiating a Content-
and the QoS captures all the non-functional aspect of the FilteredTopic for an existing topic and providing a filter
information, such as its temporal properties or its availabil- expression. The filter expression follows the same syntax of
ity. the WHERE clause on a SQL statement and can operate
on any of the topic type attributes. For instance, a filter
Topic types.. expression for a temperature sensor topic could be "id =
DDS Topics can be specified using several different syn- 101 AND (temp > 35 OR hum > 65)".
taxes, such as Interface Definition Language (IDL), eXten-
sible Markup Langauge (XML), Unified Modeling Langauge 2.2.3 DataWriters and DataReaders
(UML), and annotated Java. For instance, Listing 1 shows Since a topic defines the subjects produced and consumed,
a type declaration for an hypothetical temperature sensor DDS provides two abstractions for writing and reading these
topic-type. Some of the attributes of a topic-type can be topics: DataWriterss and DataReader s, respectively. Both
marked as representing the key of the type. DataReaders and DataWriters are strongly typed and are
defined for a specific topic and topic type.
Listing 1: Topic type declaration for an hypothetical
temperature sensor 2.2.4 Publishers, Subscribers and Partitions
1 struct TempSensorType { DDS also defines Publishers and Subscribers, which group
@Key together sets of DataReaders and DataWriters and perform
3 long id; some coordinated actions over them, as well as manage the
float temp; communication session. DDS also supports the concept of a
5 float hum; Partition which can be used to organize data flows within a
};
GDS to decompose un-related sets of topics.
The key allows DDS to define the selector that identifies a 2.3 Example of applying DDS
specific data stream, commonly called a topic instance. For Now that we presented the key architectural concepts and
instance, the topic type declaration in Listing 1 defines the entities in DDS, we will show the anatomy of a simple DDS
id attributes as being the key of the type. Each unique application in Scala using the Escalier DDS API [4]. List-
id value therefore identifies a specific data stream (topic ing 2 shows the steps required to write a sample for the
instance) for which DDS will manage the entire life-cycle. temperature sensor topic describe earlier. In this specific
This allows applications to (1) identify the specific source example, since no domain is explicitly joined, the runtime
of data, such as the specific physical sensor whose id=5, will assume that the application is interested in joining the
and (2) learn about relevant life-cycle events such as the the default domain.
creation and disposal of streams. In addition, DDS provides
applications with information concerning the existence of
4. Listing 2: A simple DDS writer. critically important, DDS adopted a strongly, and somewhat
statically, typed system to define type properties of a topic.
object TempSensorWriter {
2 def main(args: Array[String]) { This section provides an overview of the DDS type system.
4 // Define a topic 3.1 Structural Polymorphic Types System
val topic = DDS provides a polymorphic structural type system [3, 2,
6 Topic[TempSensor]("TTempSensor") 10], which means that the type system not only supports
polymorphism, but also bases its sub-typing on the structure
8 // Create a Writer
val writer = of a type, as opposed than its name. For example, consider
10 DataWriter[TempSensor](topic) the types declared in Listing 4.
12 // Create a Sample Listing 4: Nominal vs. Structural Subtypes
val ts =
14 new TempSensor(0, 25, 0.6F) 1 struct Coord2D {
long x;
16 // Write the sample 3 long y;
writer ! ts };
5
18 }
} struct Coord3d : Coord2D {
7 long z;
};
Note that no QoS has been specified for any of the DDS 9
entities defined in this code fragment, so the behavior will struct Coord {
be the default QoS. 11 long x;
long y;
13 long z;
Listing 3: A simple DDS reader. };
1 object SensorLogger {
def main(args: Array[String]) { In a nominal polymorphic type system, the Coord3D would
3
be a subtype of Coord2D, which would be expressed by
// Define a topic writing Coord3D <: Coord2D. In a nominal type sys-
5 val topic =
Topic[TempSensor]("TTempSensor") tem, however, there would be no relationship between the
7 Coord2D/Coord3D with the type Coord.
// Create a Reader Conversely, in a polymorphic structural type system like
9 val reader = the one used by DDS the type Coord is a subtype of the type
DataReader[TempSensor](topic) Coord2D—thus Coord <: Coord2D and it is structurally
11
the same type as the type Coord3D.
// Register a reaction for printing
data The main advantage of a polymorphic structural type sys-
13 reader.reactions += { tem over nominal type systems is that the former considers
case DataAvailable(dr) => { the structure of the type as opposed to the name to de-
15 (reader read) foreach (println) termine sub-types relationships. As a result, polymorphic
} structural types systems are more flexible and especially
17 } well-suited for evolvable and large-scale distributed systems.
}
19 } In particular, distributed systems often need to evolve incre-
mentally to provide a new functionality to most subsystems
and systems, without requiring a complete upgrade and re-
Listing 3 shows the steps required to read the data samples
deployment.
published on a given topic. The code fragment shows the
In the example above the type Coord was a monotonic
application registering a reaction with the DDS data reader
extension of the type Coord2D since it was added some new
by providing a partially specified function. The partially
attributes “at the end.” The DDS type system can handle
defined function uses pattern matching to select the event
both attribute reording and attribute removal with equal
in which it is interested. Specifically, the code in Listing 3
alacrity.
registers a reaction that will print received data on the con-
sole. It is worth noticing how the elegance and simplicity 3.2 Type annotations
of this simple application comes from a combination of the
The DDS type systems supports an annotation system
data-centric DDS paradigm and the use of Scala functional
very similar to that available in Java [5]. It also defines a set
features such as partial and higher order functions.
of built-in annotations that can be used to control the exten-
sibility of a type, as well as the properties of the attributes
3. THE DDS TYPE SYSTEM of a given type. Some important built-in annotations are
Strong typing plays a key role in developing software sys- described below:
tems that are efficient, easier to extend, less expensive to • The @ID annotation can be used to assign a global
maintain, and in which runtime errors are limited to those unique ID to the data members of a type. This ID is
computational aspects that either cannot be decided at compile- used to deal efficiently with attributes reordering.
time or that cannot be detected at compile-time by the type
system. Since DDS targets mission- and business-critical • The @Key annotation can be used to identify the type
systems where safety, extensibility, and maintainability are members that constitute the key of the topic type.
5. DURABILITY LIVELINESS DEST. ORDER TIME-BASED FILTER
DDS subscriptions are matched against the topic type and
HISTORY OWENERSHIP PARTITION RESOURCE LIMITS
name, as well as against the QoS being offered/requested by
LIFESPAN OWN. STRENGTH PRESENTATION
data writers and readers. This DDS matching mechanism
RELIABILITY DW LIFECYCLE
ensures that (1) types are preserved end-to-end due to the
DEADLINE
topic type matching and (2) end-to-end QoS invariants are
USER DATA DR LIFECYCLE
LATENCY BUDGET
also preserved.
TOPIC DATA ENTITY FACTORY
GROUP DATA
The reminder of this section describes the most important
TRANSPORT PRIO
QoS policies in DDS.
RxO QoS Local QoS 4.1 Data Availability
DDS provides the following QoS policies that control the
Figure 5: DDS QoS Policies. availability of data to domain participants:
• The DURABILITY QoS policy controls the lifetime of
• The @Optional annotations can be used to express the data written to the global data space in a DDS do-
that an attribute is optional and might not be set main. Supported durability levels are (1) VOLATILE,
by the sender. DDS provides specific accessors for which specifies that once data is published it is not
optional attributes that can be used to safely check maintained by DDS for delivery to late joining appli-
whether the attribute is set or not. In addition, to save cations, (2) TRANSIENT_LOCAL, which specifies that
bandwidth, DDS will not send optional attributes for publishers store data locally so that late joining sub-
which a value has not been provided. scribers get the last published item if a publisher is still
alive, (3) TRANSIENT, which ensures that the GDS
• The @Shared annotation can be used to specify that maintains the information outside the local scope of
the attribute must be referenced through a pointer. any publishers for use by late joining subscribers, and
This annotations helps avoid situations when large data- (4) PERSISTENT, which ensures that the GDS stores
structures (such as images or large arrays) would be the information persistently so to make it available to
allocated contiguously with the topic type, which may late joiners even after the shutdown and restart of the
be undesirable in resource-contrained embedded sys- whole system. Durability is achieved by relying on
tems. a durability service whose properties are configured
by means of the DURABILITY_SERVICE QoS of non-
• The @Extensibility annotation can be used to con-
volatile topics.
trol the level of extensibility allowed for a given topic
type. The possible values for this annotation are (1)
• The LIFESPAN QoS policy controls the interval of time
Final to express the fact that the type is sealed and
during which a data sample is valid. The default value
cannot be evolved – as a result this type cannot be
is infinite, with alternative values being the time-span
substituted by any other type that is structurally its
for which the data can be considered valid.
subtype, (2) Extensible to express that only mono-
tonic extensibility should be considered for a type, and • The HISTORY QoS policy controls the number of data
(3) Mutable to express that the most generic struc- samples (i.e., subsequent writes of the same topic) that
tural subtype rules for a type should be applied when must be stored for readers or writers. Possible values
considering subtypes. are the last sample, the last n samples, or all samples.
In summary, the DDS type system provides all the advan-
tages of a strongly-typed system, together with the flexibil- The data availability QoS policies decouple applications in
ity of structural type systems. This combinations supports time and space. They also enable these applications to coop-
the key requirements typical of a large class of distributed erate in highly dynamic environments characterized by con-
systems since it preserves types end-to-end, while providing tinuous joining and leaving of publisher/subscribers. Such
type-safe extensibility and incremental system evolution and properties are particularly relevant for distributed systems
high performance! since they increase the decoupling of the component parts.
4.2 Data Delivery
4. THE DDS QOS MODEL DDS provides the following QoS policies that control how
DDS relies on QoS policies to provides applications with data is delivered and how publishers can claim exclusive
explicit control over a wide set of non-functional properties, rights on data updates:
such as data availability, data delivery, data timeliness and
resource usage – Figure 5 shows the full list of available • The PRESENTATION QoS policy gives control on how
QoS. Each QoS policy might be applicable to one or more changes to the information model are presented to sub-
DDS entities, such as, topics, data readers, and data writers. scribers. This QoS gives control on the ordering as well
Policies controlling end-to-end properties are considered as as the coherency of data updates. The scope at which
part of the subscription’s matching. it is applied is defined by the access scope, which can
DDS uses a request vs. offered QoS matching approach in be one of INSTANCE, TOPIC, or GROUP level.
which a data reader matches a data writer if and only if the
QoS it is requesting for the given topic does not exceed (e.g., • The RELIABILITY QoS policy controls the level of re-
is no more stringent than) the QoS with which the data is liability associated with data diffusion. Possible choices
produced by the data writer. are RELIABLE and BEST_EFFORT distribution.
6. • The PARTITION QoS policy gives control over the as- such as Rate Monotonic Analysis, the DEADLINE QoS
sociation between DDS partitions (represented by a policy represents the natural deadline of information
string name) and a specific instance of a publisher/sub- and is used by DDS to notify violations, finally the
scriber. This association provides DDS implementa- LATENCY_BUDGET is used to optimize the resource uti-
tions with an abstraction that allow to segregate traf- lization in the system.
fic generated by different partitions, thereby improving
overall system scalability and performance. These DDS data timeliness QoS policies provide control over
the temporal properties of data. Such properties are partic-
• The DESTINATION_ORDER QoS policy controls the or- ularly relevant since they can be used to define and control
der of changes made by publishers to some instance the temporal aspects of various subsystem data exchanges,
of a given topic. DDS allows the ordering of different while ensuring that bandwidth is exploited optimally.
changes according to source or destination timestamps.
4.4 Resources
• The OWNERSHIP QoS policy controls whether it is al- DDS defines the following QoS policies to control the net-
lowed for multiple data writers to concurrently up- work and computing resources that are essential to meet
date a given topic instance. When set to EXCLUSIVE, data dissemination requirements:
this policy ensures that only one among active data
writers–namely the one with the highest OWNERSHIP- • The TIME_BASED_FILTER QoS policy allows appli-
_STRENGTH–will change the value of a topic instance. cations to specify the minimum inter-arrival time be-
The other writers, those with lower strength, are still tween data samples, thereby expressing their capabil-
able to write, yet their updates will not have an impact ity to consume information at a maximum rate. Sam-
visible on the distributed system. In case of failure of ples that are produced at a faster pace are not deliv-
the highest strength data writer, DDS automatically ered. This policy helps a DDS implementation op-
switches to the next among the remaining data writ- timize network bandwidth, memory, and processing
ers. power for subscribers that are connected over limited
bandwidth networks or which have limited computing
The data delivery QoS policies control the reliability and capabilities.
availability of data, thereby allowing the delivery of the right
data to the right place at the right time. More elaborate • The RESOURCE_LIMITS QoS policy allows applica-
ways of selecting the right data are offered by the DDS tions to control the maximum available storage to hold
content-awareness profile, which allows applications to se- topic instances and related number of historical sam-
lect information of interest based upon their content. These ples DDS’s QoS policies support the various elements
QoS policies are particularly useful since they can be used to and operating scenarios that constitute net-centric mis-
finely tune how—and to whom—data is delivered, thus lim- sion critical information management. By controlling
iting not only the amount of resources used, but also mini- these QoS policies it is possible to scale DDS from
mizing the level of interference by independent data streams. low-end embedded systems connected with narrow and
noisy radio links, to high-end servers connected to high-
4.3 Data timeliness speed fiber-optic networks.
DDS provides the following QoS policies to control the
timeliness properties of distributed data: These DDS resource QoS policies provide control over the lo-
cal and end-to-end resources, such as memory and network
• The DEADLINE QoS policy allows applications to de- bandwidth. Such properties are particularly useful in en-
fine the maximum inter-arrival time for data. DDS vironments characterized by largely heterogeneous subsys-
can be configured to automatically notify applications tems, devices, and network connections that often require
when deadlines are missed. down-sampling, as well as overall controlled limit on the
amount of resources used.
• The LATENCY_BUDGET QoS policy provides a means
for applications to inform DDS how long time the mid- 4.5 Configuration
dleware can take in order to make data available to The QoS policies described above, provide control over the
subscribers. When set to zero, DDS sends the data most important aspects of data delivery, availability, timeli-
right away, otherwise it uses the specified interval to ness, and resource usage. DDS also supports the definition
exploit temporal locality and batch data into bigger and distribution of user specified bootstrapping information
messages so to optimize bandwidth, CPU and battery via the following QoS policies:
usage.
• The TRANSPORT_PRIORITY QoS policy allows appli- • The USER_DATA QoS policy allows applications to as-
cations to control the priority associated with the data sociate a sequence of octets to domain participant,
flowing on the network. This priority is used by DDS data readers and data writers. This data is then dis-
to prioritize more important data relative to less im- tributed by means of a built-in topic—which are topics
portant data. pre-defined by the DDS standard and used for inter-
nal purposes. This QoS policy is commonly used to
The DEADLINE, LATENCY_BUDGET and TRANSPORT- distribute security credentials.
_PRIORITY QoS policy provide the controls necessary
to build priority pre-emptive distributed real-time sys- • The TOPIC_DATA QoS policy allows applications to
tems. In these systems, the TRANSPORT_PRIORITY associate a sequence of octet with a topic. This boot-
is derived from a static priority scheduling analysis, strapping information is distributed by means of a
7. built-in topic. A common use of this QoS policy is 5.1 Last n-Values Channel
to extend topics with additional information, or meta- To obtain a Last n-Values Channel the HISTORY QoS
information, such as IDL type-codes or XML schemas. policy has to be set to KEEP_LAST for a depth n. In this
case we are guaranteed that the data reader will be receiv-
• The GROUP_DATA QoS policy allows applications to ing the last n-samples produced by the data writer. The
associate a sequence of octets with publishers and sub- DURABILITY QoS can be used in this case to decouple the
scribers – this bootstrapping information is distributed production of data by the data writer with the consumption
by means built-in topics. A typical use of this infor- of data by the data reader. The Last n-Values Channel is
mation is to allow additional application control over useful to represent distributed state. For instance in several
subscriptions matching. Big Data analytics application, when performing the analyse
phase (see Figure 1) what matters are the current values, or
These DDS configuration QoS policies provide useful a mech- perhaps the last n values, of the relevant indexes as opposed
anism for bootstrapping and configuring applications in a to the the full set of changes since the last analyse phase.
distributed systems. Last n-Values Channels should always be used when mod-
elling distributed state as they allow to minimize the use of
computational, storage and network resources.
5. DDS CACHES AND CHANNELS
Each DDS data writer and data reader has an associated 5.2 FIFO Channel
data cache. The data writer cache stores a subset of the To obtain a FIFO Channel the HISTORY QoS policy has
samples written. The data reader cache stores a subset of to be set to KEEP_ALL. In this case we are guaranteed that
the samples produced by its matching data writers. The the data reader will be receiving all the samples produced by
exact behaviour of the cache, w.r.t. the samples it will store, the matching data writers. The DURABILITY QoS can be
is controlled through DDS QoS policies. used to decouple the production of data, by the data writer,
One way of abstracting data readers and data writers with the consumption of data, by the data reader, but be-
caches and their semantics under different QoS policies, along ware that for TRANSIENT and PERSISTENT DURABILITY
with data distribution, is to reason in terms of channels. the set of samples stored by the durability service is con-
A DDS data writer and its matching data readers can be trolled through a specific policy, thus this QoS does not nec-
thought of as connected by a logical typed communication essarily imply that all samples have to be persisted.
channel (see Figure 6). DDS’ QoS policies define the prop- The FIFO Channel is useful to represent queues as well as
erties controlling the set of samples, produced by the data distributed events. This kind of channel is also very useful
writer, that will be eventually received by the matching data when implementing distributed algorithms such as mutual
readers. exclusion, or distributed queues over DDS.
At the two extreme, DDS provides support for Last n-
Values Channel and for FIFO Channel. In order to reason 5.3 DDS Read and Take
about the samples that a reader might receive we will be To complete the picture, along with describing the two key
considering the RELIABILITY QoS as set to RELIABLE. channel semantics provided by DDS we need to explain how
If that was not the case, the only guarantee that can be received data can be retrieved by the application. To each
provided to a matching data reader is that it will receive a DDS data reader is associated a local cache where the data
subset of the data produced by the matching data writer. received by the matching data writers is stored. The appli-
This subset will in general be different for each matching cation can read the content of the cache using both state as
reader. well as content selectors in which case the samples remain
In the remainder of this section we will investigate the QoS in the cache and are simply marked as read, i.e. the state of
policies that provide control over the communication chan- the samples is changed. Otherwise, the application can take
nel and will describe the semantics associated with these the samples from the cache, in which case the samples are
channels. As a final remark it is worth pointing out that removed by the cache.
these channels are logical and fully distributed. In general, the read operation is used in combination with
Last n-Values Channel, while the take operation is used with
the FIFO Channels. It is a common beginners programming
error to use read with FIFO Channels with the result that
DR the resource usage will keep growing – unless the user had
specified some resource limits.
DW Topic DR 5.4 Selectors
As mentioned before, DDS provides both content as well
as state selectors to specify the properties of the samples
that should be read.
DR
5.4.1 Content Selectors
Figure 6: The logical channel between a DDS data Content Filters.
writer and its matching readers One of the mechanisms provided by DDS to select the
content that will be received by a data reader are Content
Filtered Topics. These kind of topics are created by associ-
8. ating query expression with an existing topic (see Listing 5). It is worth noticing how the strongly typed nature of DDS
The query expression specify the filter that will be evaluated perfectly mixes with the Scala type system producing ele-
against the samples produced by the matching writers to de- gant, compact and very efficient code.
termine whether the sample is relevant for the reader or not.
In the example shown in Listing 5, only the samples whose Streams/Instances.
temperature is greater than 25 or whose humidity is higher Often applications need to access the samples belonging
than 75% that will be seen by the data reader. to a specific stream or topic instance. One way to achieve
this is to query for a specific key value. A better way is to
Listing 5: Content Filtered Topic. select a stream by using an instance handle – the DDS spe-
1 val topic = cific way of identifying a stream or topic instance. Listing 9
Topic[TempSensor]("TTempSensor") shows how the data belonging to a specific stream can be
3 val query = programmatically selected.
Query("temp > %0 || hum > %1", List(25,
0.75F) Listing 9: Query Selector.
5 val ftopic =
ContentFilteredTopic[TempSensor](" val handle = reader lookup (key)
CFTempSensor",topic, query) 2 val data =
7 val reader = (reader instance(handle)) read
DataReader[TempSensor](ftopic)
5.4.2 State Selectors
Queries. In addition to content-based selectors, DDS also provides
DDS also provide a mechanism for executing queries over state selectors which allow to select samples based on their
the content of the data reader cache (see Listing 6). The state. Specifically, DDS state selectors allow to select sam-
main difference between queries and content filtered topics ples with respect to three different states, the READ_STATE
is that the former allows to select samples of the cache based the VIEW_STATE and the INSTANCE_STATE. The READ-
on their content, while the latter controls the samples that _STATE allows to distinguish between new samples and sam-
get into the cache based on their content. ples that have already been read. The VIEW_STATE allows
to distinguish between samples belonging to a new instance
Listing 6: Query Selector. as opposed to samples belonging to an instance that were
1 val data = already known. Finally the INSTANCE_STATE allows to dis-
(reader filterContent(query)) read tinguish between samples belonging to an instance for which
there are still active writers as opposed to a stream for which
there are no more writers. The Listing 10 shows an example
of state-based selector.
List Comprehension.
Another way of manipulating the data available on the Listing 10: Instance Selector.
DDS cache is through list comprehension. In this case, the
higher order functions defined as part of the Scala List class 1 val data =
can be used to perform arbitrary operations on the items (reader
3 filterState(SampleState.NewData)
available on the cache. As an example, assuming that we ) read
wanted to compute the moving average of the temperature
over the last n samples, we could have used a DDS data
reader with history depth set to n and then simply com- Finally, it is worth mentioning that the content and state
puted the moving average using the List fold right function selectors can be mixed and matched at will to perform com-
as shown in Listing 7. plex selection operations as shown in the example of List-
ing 11. Additionally, selectors can be used with read as well
Listing 7: List Comprehensions in DDS with fold- as with take operations.
right.
Listing 11: Example of Selectors composition.
val data =
2 reader read val data =
val mavg = 2 (reader
4 (data : 0) (_ + _) / (data size)
4 instance(handle)
Another way to achieve the same goal is to first map the filterContent(query)
6 filterState(SampleState.NewData)
list of TempSensorType to a list of temperatures and the ) read
compute the moving average. This approach is shown in
Figure 8
Listing 11 is showing an examples in which the data selected
Listing 8: List Comprehensions in DDS with map. is that belonging to a specific stream for which a query holds,
and that have never been read before.
val data= Obviously, list comprehension can be used for operating
2 (reader read) map (_.temp) on samples content and state. Differently from DDS built-in
val mavg =
4 (data sum) / (data size) selectors, list comprehension operates on the samples read
or taken, as opposed to being used for selecting the samples
9. to read or take from the cache. Thus when using list com-
prehension one should always be careful in selecting first the
set of samples from the cache.
6. CONCLUDING REMARKS
This paper has introduced the key concepts at the foun-
dation of the OMG DDS and has shown how it combination
with programming languages exposing functional features,
such as Scala, makes a very natural and elegant combina-
tion.
The DDS standard is used today a large number of mission-
and business- critical systems and is becoming more and
more relevant to Big Data applications due to its perfor-
mance its scalability and extensibility.
7. REFERENCES
[1] http://bit.ly/nebout03, 2003.
[2] L. Cardelli. Type systems. ACM Comput. Surv.,
28:263–264, 1996.
[3] L. Cardelli and P. Wegner. On understanding types,
data abstraction, and polymorphism. ACM Computing
Surveys, 17(4):471–522, 1985.
[4] A. Corsaro. https://github.com/kydos/escalier, 2011.
[5] J. Gosling, B. Joy, G. Steele, and G. Bracha.
Java(TM) Language Specification, The (3rd Edition)
(Java (Addison-Wesley)). Addison-Wesley
Professional, 2005.
[6] S. Microsystems. The java message service
specification v1.1, 2002.
[7] M. Odersky, L. Spoon, and B. Venners. Programming
in Scala. Artima Inc, 2 edition, 2011.
[8] OMG. Data distribution service for real-time systems
v1.2, 2007.
[9] OMG. Data distribution service interoperability wire
protocol v2.1, 2008.
[10] OMG. Dynamic and extensible topic types, 2010.