• Persistence (collecting, aggregating and moving data
) for later Batch processing.
• Could be integrated into a lambda architecture
• Quite flexible and configurable: based on stream data
flows with a pub-sub like comunication model.
• What is it for?
– Cygnus is a connector in charge of persisting Orion context data in certain
configured third-party storages, creating a historical view of such data. In other
words, Orion only stores the last value regarding an entity's attribute, and if an
older value is required then you will have to persist it in other storage, value by
value, using Cygnus.
• How does it receives context data from Orion Context Broker?
– Cygnus uses the subscription/notification feature of Orion. A subscription is made
in Orion on behalf of Cygnus, detailing which entities we want to be notified when
an update occurs on any of those entities attributes.
• Cygnus is a connector in charge of persisting certain
sources of data in certain configured third-party
storages, creating a historical view of such data.
• Internally, Cygnus is based on Apache Flume, data
collection and persistence agents.
– An agent is basically composed of a listener or source in charge of receiving the
data, a channel where the source puts the data once it has been transformed
into a Flume event, and a sink, which takes Flume events from the channel in
order to persist the data within its body into a third-party storage.
• NGSI-like context data in:
– HDFS, the Hadoop distributed file system.
– MySQL, the well-know relational database manager.
– CKAN, an Open Data platform.
– MongoDB, the NoSQL document-oriented database.
– STH Comet, a Short-Term Historic database built on top of MongoDB.
– Kafka, the publish-subscribe messaging broker.
– DynamoDB, a cloud-based NoSQL database by Amazon Web Services.
– PostgreSQL, the well-know relational database manager.
– Carto, the database specialized in geolocated data.
• Twitter data in:
– HDFS, the Hadoop distributed file system.
• A Source consumes Events having a specific format, and those Events are
delivered to the Source by an external source like a web server. For example,
an AvroSource can be used to receive Avro Events from clients or from other
Flume agents in the flow. When a Source receives an Event, it stores it into
one or more Channels. The Channel is a passive store that holds the Event
until that Event is consumed by a Sink. One type of Channel available in
Flume is the FileChannel which uses the local filesystem as its backing store.
A Sink is responsible for removing an Event from the Channel and putting it
into an external repository like HDFS (in the case of an HDFSEventSink) or
forwarding it to the Source at the next hop of the flow. The Source and Sink
within the given agent run asynchronously with the Events staged in the
• One instance for each
• This add more capability to
Connecting Orion Context Broker and
• Cygnus takes advantage of the subscription-notification mechanism
of Orion Context Broker. Specifically, Cygnus needs to be notified each
time certain entity's attributes change, and in order to do that, Cygnus
must subscribe to those entity's attribute changes.
HDFS details regarding Cygnus persistence
• By default, for each entity Cygnus stores the data at:
• Within each HDFS file, the data format may be json-row or json-column:
• Simple configuration:
– implementing HA for Flume/Cygnus is as easy as running two
instances of the software and putting a load balancer in
between them and the data source (or sources).
• Use File Channels instead of Memory Channels (extra
persistence) which is the default.
• Advanced configuration:
– Flume with Zookeeper
Data schemas and pre-aggregation
• Although the STH stores the evolution of (raw) data (i.e., attributes
values) in time, its real power comes from the storage of aggregated
• The STH should be able to respond to queries such as:
– Give me the maximum temperature of this room during the last month
(range) aggregated by day (resolution)
– Give me the mean temperature of this room today (range) aggregated by
hour or even minute (resolution)
– Give me the standard deviation of the temperature of this room this last
year (range) aggregated by day (resolution)
– Give me the number of times the air conditioner of this room was switched
on or off last Monday (range) aggregated by hour
• The per agent Quick Start Guide found at readthedocs.org provides a good
documentation summary (cygnus-ngsi, cygnus-twitter).
• Nevertheless, both the Installation and Administration Guide and the User and
Programmer Guide for each agent also found at readthedocs.org cover more advanced
• The per agent Flume Extensions Catalogue completes the available documentation for
Cygnus (cygnus-ngsi, cygnus-twitter).
• Other interesting links are:
• Our Apiary Documentation if you want to know how to use our API methods for
• cygnus-ngsi integration examples .
• cygnus-ngsi introductory course in FIWARE Academy.
Round Robin channel selection
• It is possible to configure more than one channel-sink pair for each
storage, in order to increase the performance
• A custom ChannelSelector is needed
Pattern-based Context Data Grouping
• Default destination (HDFS file, mMySQL table or CKAN resource) is obtained as a
• It is possible to group different context data thanks to this regex-based feature
implemented as a Flume interceptor:
cygnusagent.sources.http-source.interceptors = ts de
cygnusagent.sources.http-source.interceptors.ts.type = timestamp
Matching table for pattern-based grouping
• CSV file (‘|’ field separator) containing rules
• For instance:
• HDFS may be secured with Kerberos for authentication purposes
• Cygnus is able to persist on kerberized HDFS if the configured HDFS user has a
registered Kerberos principal and this configuration is added:
cygnusagent.sinks.hdfs-sink.krb5_auth = true
cygnusagent.sinks.hdfs-sink.krb5_auth.krb5_user = krb5_username
cygnusagent.sinks.hdfs-sink.krb5_auth.krb5_password = xxxxxxxxxxxx
Follow @FIWARE on Twitter
FIWARE Big Data ecosystem :
Cygnus and STH-Comet
Universidad Politécnica de Madrid (UPM)
Joaquin.email@example.com, @jsalvachua, @FIWARE