Alejandro Llaves
Ontology Engineering Group
Universidad Politécnica de Madrid
Madrid, Spain
allaves@fi.upm.es
Oct 21 2015
Virtual Clusters for
(RDF) Stream Processing
Outline

Some context: morph-streams++

Motivation

Use case: Sensor Cloud data integration

Topologies everywhere

Setting up a virtual cluster

Deploying Storm topologies

Conclusion
Some context...
Motivation

Integrating an unbounded stream of heterogeneous
sensor observations

Solution:
– Storm topologies for real-time processing
– Semantic Sensor Network (SSN) ontology for
modelling observations
– SWEET ontology for environmental phenomena
Use case: Sensor Cloud data integration (1/3)
Sensor Cloud

Viticulture, water
management, weather
monitoring, oyster farming...

RESTful API – JSON

Network → Platform →
Sensor → Phenomenon →
Observation

Lack of semantic
descriptions, e.g.
rain_trace vs Rain.

Multiple HTTP requests to
query various streams.
Source: CSIRO
Use case: Sensor Cloud data integration (2/3)

Sensor Cloud messages to field-named tuples

SWEET annotations for heterogeneous phenomena descriptions
<sample time=”2015­05­28T16:30” value=”15” sensor=”bom_gov_au.94961.air.air_temp”/>
[“2015­05­28T16:32”, “2015­05­28T16:30”, “15”, “bom_gov_au”, “94961”, “air”, “air_temp”,
“­43.3167”, “147.0075”]
network
phenomenon
platform sensorsampling time
system time
latitude longitude
SensorCloudParser
Bolt
SweetAnnotations
Bolt
Use case: Sensor Cloud data integration (3/3)
SSN mapping
SSNConverter
Bolt
Topologies everywhere

A Storm topology “is a graph of stream transformations
where each node is a spout or bolt”.
https://storm.apache.org/documentation/Tutorial.html

Example of simple topology
Setting up a virtual cluster (1/2)
Wirbelsturm - https://github.com/miguno/wirbelsturm/

Allows deploying (local or remote) virtual clusters.

Focus on Big Data technologies: Storm, Kafka,
Zookeeper...

Uses Vagrant for “easy to configure, reproducible, and
portable work environments” - https://docs.vagrantup.com/v2/why-vagrant/index.html

Uses Puppet for provisioning: installation and
configuration of SW packages in the cluster nodes.
Setting up a virtual cluster (2/2)

$ ./deploy

Show wirbelsturm.yaml

Check Storm GUI -
http://localhost:28080/index.html
Deploying Storm topologies

$ ./deploy

Show wirbelsturm.yaml

Check Storm GUI -
http://localhost:28080/index.html

Describe simple topology

Compile & deploy

Describe a topology set

Configure Kafka

Compile & deploy
Conclusion
Conclusion

Wirbelsturm allows easy configuration & deployment of virtual clusters,
with focus on Big Data technologies.

SSN and SWEET ontologies to model and integrate environmental
sensor observations.

Parallelization of bottleneck tasks reduces the average message
processing latency (up to some extent). More about Storm
parallelization: http://bit.ly/1NVyjU2

Delaying RDF conversion does not speed up the processing of Sensor
Cloud messages in the tested environment.

Submitted paper to IJSWIS, special issue on Velocity and Variety
Dimensions of Big Data – Llaves, Corcho et al.
What's coming next

Flying faster with Heron - https://blog.twitter.com/2015/flying-faster-with-twitter-heron
The presented research has has been funded by Ministerio de
Economía y Competitividad (Spain) under the project ”4V:
Volumen, Velocidad, Variedad y Validez en la Gestión Innovadora
de Datos” (TIN2013-46238-C4-2-R), by the EU Marie Curie
IRSES project SemData (612551), and supported by an AWS in
Education Research Grant award.
Alejandro Llaves
allaves@fi.upm.es
Thanks!

Virtual Clusters for (RDF) Stream Processing

  • 1.
    Alejandro Llaves Ontology EngineeringGroup Universidad Politécnica de Madrid Madrid, Spain allaves@fi.upm.es Oct 21 2015 Virtual Clusters for (RDF) Stream Processing
  • 2.
    Outline  Some context: morph-streams++  Motivation  Usecase: Sensor Cloud data integration  Topologies everywhere  Setting up a virtual cluster  Deploying Storm topologies  Conclusion
  • 3.
  • 4.
    Motivation  Integrating an unboundedstream of heterogeneous sensor observations  Solution: – Storm topologies for real-time processing – Semantic Sensor Network (SSN) ontology for modelling observations – SWEET ontology for environmental phenomena
  • 5.
    Use case: SensorCloud data integration (1/3) Sensor Cloud  Viticulture, water management, weather monitoring, oyster farming...  RESTful API – JSON  Network → Platform → Sensor → Phenomenon → Observation  Lack of semantic descriptions, e.g. rain_trace vs Rain.  Multiple HTTP requests to query various streams. Source: CSIRO
  • 6.
    Use case: SensorCloud data integration (2/3)  Sensor Cloud messages to field-named tuples  SWEET annotations for heterogeneous phenomena descriptions <sample time=”2015­05­28T16:30” value=”15” sensor=”bom_gov_au.94961.air.air_temp”/> [“2015­05­28T16:32”, “2015­05­28T16:30”, “15”, “bom_gov_au”, “94961”, “air”, “air_temp”, “­43.3167”, “147.0075”] network phenomenon platform sensorsampling time system time latitude longitude SensorCloudParser Bolt SweetAnnotations Bolt
  • 7.
    Use case: SensorCloud data integration (3/3) SSN mapping SSNConverter Bolt
  • 8.
    Topologies everywhere  A Stormtopology “is a graph of stream transformations where each node is a spout or bolt”. https://storm.apache.org/documentation/Tutorial.html  Example of simple topology
  • 11.
    Setting up avirtual cluster (1/2) Wirbelsturm - https://github.com/miguno/wirbelsturm/  Allows deploying (local or remote) virtual clusters.  Focus on Big Data technologies: Storm, Kafka, Zookeeper...  Uses Vagrant for “easy to configure, reproducible, and portable work environments” - https://docs.vagrantup.com/v2/why-vagrant/index.html  Uses Puppet for provisioning: installation and configuration of SW packages in the cluster nodes.
  • 12.
    Setting up avirtual cluster (2/2)  $ ./deploy  Show wirbelsturm.yaml  Check Storm GUI - http://localhost:28080/index.html
  • 13.
    Deploying Storm topologies  $ ./deploy  Showwirbelsturm.yaml  Check Storm GUI - http://localhost:28080/index.html  Describe simple topology  Compile & deploy  Describe a topology set  Configure Kafka  Compile & deploy
  • 15.
    Conclusion Conclusion  Wirbelsturm allows easyconfiguration & deployment of virtual clusters, with focus on Big Data technologies.  SSN and SWEET ontologies to model and integrate environmental sensor observations.  Parallelization of bottleneck tasks reduces the average message processing latency (up to some extent). More about Storm parallelization: http://bit.ly/1NVyjU2  Delaying RDF conversion does not speed up the processing of Sensor Cloud messages in the tested environment.  Submitted paper to IJSWIS, special issue on Velocity and Variety Dimensions of Big Data – Llaves, Corcho et al. What's coming next  Flying faster with Heron - https://blog.twitter.com/2015/flying-faster-with-twitter-heron
  • 16.
    The presented researchhas has been funded by Ministerio de Economía y Competitividad (Spain) under the project ”4V: Volumen, Velocidad, Variedad y Validez en la Gestión Innovadora de Datos” (TIN2013-46238-C4-2-R), by the EU Marie Curie IRSES project SemData (612551), and supported by an AWS in Education Research Grant award. Alejandro Llaves allaves@fi.upm.es Thanks!