SlideShare a Scribd company logo
First Joint International Workshop on
Semantic Sensor Networks and Terra Cognita
October 11, 2015, Bethlehem, PA, USA
Emrooz: A Scalable Database for
SSN Observations
Markus Stocker, Narasinha Shurpali, Kerry Taylor, George
Burba, Mauno Rönkkö, Mikko Kolehmainen
markus.stocker@uef.fi
@markusstocker and @envinf
2
Introduction
Expressive ontologies for sensor (meta-) data (SSN)
Flexible graph data model (RDF)
Triple stores obvious choice
Unfortunately hardly viable at scale
Triple stores indexes for graph pattern queries
Not designed for time series interval queries
3
Aim
Build a database that ...
Consumes SSN observations in RDF
Evaluates SSN observation SPARQL queries
Scales to billions of observations
Has better query performance than triple stores
4
Architecture
5
Cassandra data model
Schema consisting of
Partition key (row key) of type ascii
Clustering key (column name) of type timeuuid
Column value of type blob
The partition key consists of two (dash-concatenated) parts
SHA-256 hex string digest of sensor-property-feature URIs
Date time string of pattern yyyyMMddHHmm
Computed from observation result time
Floor-rounded to year, month, day, hour, or minute
Rounding depends on sensor sampling frequency
Goal is to limit the number of columns per row
Clustering key determined by observation result time
Column value is set of triples for observation (binary)
6
Experiments
LI-7500A Open Path CO2/H2O Gas Analyzer
LI-7700 Open Path CH4 Analyzer
Property of mole fraction
Three features for the monitored gases
8
Experiments
January 7 to May 26, 2015, 6045 GHG archive files
Estimated # of sensor observations is 326 430 000
Estimated # of triples is 4.9 billion (15 triples / observation)
Load and query performance on 10 subsets
SPARQL query with 10 min interval
Compared to Stardog and Blazegraph
Test performance with varying time interval
9
The query
select ?time ?value
where { [
ssn:observedBy licor:LERS-75H-2035 ;
ssn:observedProperty sweet-propFraction:MoleFraction ;
ssn:featureOfInterest sweet-matrCompound:CO2 ;
ssn:observationResultTime [ time:inXSDDateTime ?time ] ;
ssn:observationResult [ ssn:hasValue [
dul:hasRegionDataValue ?value
] ]
]
filter (?time >= "2015-04-15T00:00:00.000+06:00"^^xsd:dateTime
&& ?time < "2015-04-15T00:10:00.000+06:00"^^xsd:dateTime)
}
order by asc(?time)
10
Results: Some figures
Subset Observations Triples Distinct
30 m 54 000 810 000 648 007
1 h 108 000 1 620 000 1 296 007
3 h 324 000 4 860 000 3 888 007
6 h 647 997 9 719 955 7 775 971
12 h 1 295 997 19 439 955 15 551 971
1 d 2 591 994 38 879 910 31 103 935
7 d 18 140 271 272 104 065 217 683 259
1 M 72 526 464 1 087 896 960 870 317 575
3 M 194 188 107 2 912 821 605 *
J-M 328 715 445 4 930 731 675 *
11
Results: Load performance
10
100
1000
10000
100000
1000000
30 m 1 h 3 h 6 h 12 h 1 d 7 d 1 m 3 m J-M
Time(logscale)[s]
Subsets
Emrooz
Blazegraph
Stardog
12
Results: Query performance
10
100
1000
30 m 1 h 3 h 6 h 12 h 1 d 7 d 1 m 3 m J-M
Time(logscale)[s]
Subsets
Emrooz
Blazegraph
Stardog
13
Results: Query size performance
0
1
2
3
4
5
6
7
8
9
10
1 s 30 s 1 m 5 m 10 m 20 m 30 m 40 m 50 m 60 m
Time[s]
Query time interval
Emrooz
14
REST
curl http://localhost:8080/sensors/list
curl http://localhost:8080/properties/list
curl http://localhost:8080/features/list
curl -H "Accept: application/json" 
http://localhost:8080/sensors/list
curl -H "Accept: text/csv" -G 
--data-urlencode sensor=http://example.org#thermometer 
--data-urlencode property=http://example.org#temperature 
--data-urlencode feature=http://example.org#air 
--data-urlencode from=2015-04-21T01:00:00.000+03:00 
--data-urlencode to=2015-04-21T02:00:00.000+03:00 
http://localhost:8080/observations/sensor/list
15
R
host <- "http://localhost:8080"
df.sensors <- read.csv(text=getURL(paste0(host, "/sensors/list")),
header=FALSE, col.names=c("sensor"))
df.sensors
sensor
1 http://licor.com#LERS-75H-CH4
2 http://licor.com#LERS-75H-CO2
16
R
host <- "http://localhost:8080"
sensor <- "http://licor.com#LERS-75H-CO2"
property <- "http://sweet.jpl.nasa.gov/2.3/propMass.owl#Density"
feature <- "http://sweet.jpl.nasa.gov/2.3/matrCompound.owl#CarbonDioxide"
from <- "2015-01-07T00:00:00.000+06:00"
to <- "2015-01-07T00:01:00.000+06:00"
url <- paste0(host, "/observations/sensor/list?",
"sensor=", curlEscape(sensor),
"&property=", curlEscape(property),
"&feature=", curlEscape(feature),
"&from=", curlEscape(from),
"&to=", curlEscape(to))
df.observations <- read.csv(text=getURL(url,
httpheader=c(Accept="text/csv")), header=TRUE, sep=",")
ggplot(data=df.observations, aes(time, value))
+ geom_line() + xlab("Time") + ylab("CO2 [mmol m-3]")
17
R
18.00
18.05
18.10
18.15
00 15 30 45 00
Time
CO2[mmolm−3]
18
Related and future work
Other authors have pointed out the problem
“Semantification of measurement data not promising”
RDF databases on NoSQL systems (e.g. Cumulus RDF)
Support for QB observations (done)
REST API (preliminary)
Integration with R/Matlab (preliminary)
Performance comparison with other systems
19
Conclusion
SSN and RDF nice for sensor (meta-) data
Triple stores inadequate for observation data
Alternative approaches required
What are the advantages and disadvantages?
Reasoning on all data by some sensor?
Query for observation values exceeding threshold?

More Related Content

What's hot

Counters for real-time statistics
Counters for real-time statisticsCounters for real-time statistics
Counters for real-time statistics
Edward Capriolo
 
Round Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogsRound Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogs
Mario Juric
 
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Florian Lautenschlager
 
Go and Uber’s time series database m3
Go and Uber’s time series database m3Go and Uber’s time series database m3
Go and Uber’s time series database m3
Rob Skillington
 
VO Course 02: Astronomy & Standards
VO Course 02: Astronomy & StandardsVO Course 02: Astronomy & Standards
VO Course 02: Astronomy & Standards
Joint ALMA Observatory
 
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
Of Sampling and Smoothing: Approximating Distributions over Linked Open DataOf Sampling and Smoothing: Approximating Distributions over Linked Open Data
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
Thomas Gottron
 
Building blocks for aggregate programming of self-organising applications
Building blocks for aggregate programming of self-organising applicationsBuilding blocks for aggregate programming of self-organising applications
Building blocks for aggregate programming of self-organising applications
FoCAS Initiative
 
CCLS Internship Presentation
CCLS Internship PresentationCCLS Internship Presentation
CCLS Internship Presentation
Charles Naut
 
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...
Globus
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
Rob Emanuele
 
Big Data Analysis with Crate and Python
Big Data Analysis with Crate and PythonBig Data Analysis with Crate and Python
Big Data Analysis with Crate and Python
Matthias Wahl
 
Beyond Lists - Functional Kats Conf Dublin 2015
Beyond Lists - Functional Kats Conf Dublin 2015Beyond Lists - Functional Kats Conf Dublin 2015
Beyond Lists - Functional Kats Conf Dublin 2015
Phillip Trelford
 
Oil and Gas Imaging, pumpsandpipesmdhc
Oil and Gas Imaging, pumpsandpipesmdhcOil and Gas Imaging, pumpsandpipesmdhc
Oil and Gas Imaging, pumpsandpipesmdhctmhsweb
 
Redis TimeSeries: Danni Moiseyev, Pieter Cailliau
Redis TimeSeries: Danni Moiseyev, Pieter CailliauRedis TimeSeries: Danni Moiseyev, Pieter Cailliau
Redis TimeSeries: Danni Moiseyev, Pieter Cailliau
Redis Labs
 
Cassandra at talkbits
Cassandra at talkbitsCassandra at talkbits
Cassandra at talkbits
Max Alexejev
 
SRAdb Bioconductor Package Overview
SRAdb Bioconductor Package OverviewSRAdb Bioconductor Package Overview
SRAdb Bioconductor Package Overview
Sean Davis
 
ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams wit...
ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams wit...ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams wit...
ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams wit...
Srinath Perera
 

What's hot (19)

Counters for real-time statistics
Counters for real-time statisticsCounters for real-time statistics
Counters for real-time statistics
 
Round Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogsRound Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogs
 
Wimax
WimaxWimax
Wimax
 
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
 
Go and Uber’s time series database m3
Go and Uber’s time series database m3Go and Uber’s time series database m3
Go and Uber’s time series database m3
 
VO Course 02: Astronomy & Standards
VO Course 02: Astronomy & StandardsVO Course 02: Astronomy & Standards
VO Course 02: Astronomy & Standards
 
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
Of Sampling and Smoothing: Approximating Distributions over Linked Open DataOf Sampling and Smoothing: Approximating Distributions over Linked Open Data
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
 
Building blocks for aggregate programming of self-organising applications
Building blocks for aggregate programming of self-organising applicationsBuilding blocks for aggregate programming of self-organising applications
Building blocks for aggregate programming of self-organising applications
 
CCLS Internship Presentation
CCLS Internship PresentationCCLS Internship Presentation
CCLS Internship Presentation
 
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
 
Big Data Analysis with Crate and Python
Big Data Analysis with Crate and PythonBig Data Analysis with Crate and Python
Big Data Analysis with Crate and Python
 
Beyond Lists - Functional Kats Conf Dublin 2015
Beyond Lists - Functional Kats Conf Dublin 2015Beyond Lists - Functional Kats Conf Dublin 2015
Beyond Lists - Functional Kats Conf Dublin 2015
 
Oil and Gas Imaging, pumpsandpipesmdhc
Oil and Gas Imaging, pumpsandpipesmdhcOil and Gas Imaging, pumpsandpipesmdhc
Oil and Gas Imaging, pumpsandpipesmdhc
 
Redis TimeSeries: Danni Moiseyev, Pieter Cailliau
Redis TimeSeries: Danni Moiseyev, Pieter CailliauRedis TimeSeries: Danni Moiseyev, Pieter Cailliau
Redis TimeSeries: Danni Moiseyev, Pieter Cailliau
 
Cassandra at talkbits
Cassandra at talkbitsCassandra at talkbits
Cassandra at talkbits
 
SRAdb Bioconductor Package Overview
SRAdb Bioconductor Package OverviewSRAdb Bioconductor Package Overview
SRAdb Bioconductor Package Overview
 
ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams wit...
ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams wit...ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams wit...
ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams wit...
 
JavaCro'15 - Big Data in a DIY home - Marko Švaljek
JavaCro'15 - Big Data in a DIY home - Marko ŠvaljekJavaCro'15 - Big Data in a DIY home - Marko Švaljek
JavaCro'15 - Big Data in a DIY home - Marko Švaljek
 

Viewers also liked

Group work
Group workGroup work
07 application security fundamentals - part 2 - security mechanisms - data ...
07   application security fundamentals - part 2 - security mechanisms - data ...07   application security fundamentals - part 2 - security mechanisms - data ...
07 application security fundamentals - part 2 - security mechanisms - data ...
appsec
 
Internet buscadores
Internet  buscadoresInternet  buscadores
Internet buscadores
VANINAGONZALEZ410
 
A study of the demographic differences of instructors in using e-Textbooks in...
A study of the demographic differences of instructors in using e-Textbooks in...A study of the demographic differences of instructors in using e-Textbooks in...
A study of the demographic differences of instructors in using e-Textbooks in...
Sirui Wang
 
Tulos foorumi elisa_ranta-aho_04112015
Tulos foorumi elisa_ranta-aho_04112015Tulos foorumi elisa_ranta-aho_04112015
Tulos foorumi elisa_ranta-aho_04112015
PROimpact_Oy
 
Photography
Photography Photography
Photography
Anna Polud
 
DIWALI GREETINGS FROM SHREE NARENDRA MODI - DINESH VORA
DIWALI GREETINGS FROM SHREE NARENDRA MODI - DINESH VORADIWALI GREETINGS FROM SHREE NARENDRA MODI - DINESH VORA
DIWALI GREETINGS FROM SHREE NARENDRA MODI - DINESH VORA
Dinesh Vora
 
GovindR&D ConferencePresentationOct2015pdf
GovindR&D ConferencePresentationOct2015pdfGovindR&D ConferencePresentationOct2015pdf
GovindR&D ConferencePresentationOct2015pdfRakesh Govind
 
Acnl2015 arjen uittenbogaard-echt agile veranderen
Acnl2015 arjen uittenbogaard-echt agile veranderenAcnl2015 arjen uittenbogaard-echt agile veranderen
Acnl2015 arjen uittenbogaard-echt agile veranderen
AgileConsortiumINT
 
1 Million Favors & Counting
1 Million Favors & Counting1 Million Favors & Counting
1 Million Favors & Counting
Carlos N. Menchaca
 
Asso%20artchimade
Asso%20artchimadeAsso%20artchimade
Asso%20artchimade
Stefan Leininger
 
bezar_pitch_catalog_FINAL_nocrops
bezar_pitch_catalog_FINAL_nocropsbezar_pitch_catalog_FINAL_nocrops
bezar_pitch_catalog_FINAL_nocropsBJ Murray
 
Target audience (isobel ellen sanger)
Target audience (isobel ellen sanger)Target audience (isobel ellen sanger)
Target audience (isobel ellen sanger)
issyellensanger
 
9 урок теорія робочий стіл
9 урок теорія робочий стіл9 урок теорія робочий стіл
9 урок теорія робочий стіл
Andy Levkovich
 
OMC Brasil China
OMC Brasil ChinaOMC Brasil China
OMC Brasil China
Leonardo Macedo
 

Viewers also liked (18)

Group work
Group workGroup work
Group work
 
07 application security fundamentals - part 2 - security mechanisms - data ...
07   application security fundamentals - part 2 - security mechanisms - data ...07   application security fundamentals - part 2 - security mechanisms - data ...
07 application security fundamentals - part 2 - security mechanisms - data ...
 
TERMPROJECT1
TERMPROJECT1TERMPROJECT1
TERMPROJECT1
 
Internet buscadores
Internet  buscadoresInternet  buscadores
Internet buscadores
 
Resume2015-1
Resume2015-1Resume2015-1
Resume2015-1
 
A study of the demographic differences of instructors in using e-Textbooks in...
A study of the demographic differences of instructors in using e-Textbooks in...A study of the demographic differences of instructors in using e-Textbooks in...
A study of the demographic differences of instructors in using e-Textbooks in...
 
Tulos foorumi elisa_ranta-aho_04112015
Tulos foorumi elisa_ranta-aho_04112015Tulos foorumi elisa_ranta-aho_04112015
Tulos foorumi elisa_ranta-aho_04112015
 
infographic TK
infographic TKinfographic TK
infographic TK
 
Photography
Photography Photography
Photography
 
DIWALI GREETINGS FROM SHREE NARENDRA MODI - DINESH VORA
DIWALI GREETINGS FROM SHREE NARENDRA MODI - DINESH VORADIWALI GREETINGS FROM SHREE NARENDRA MODI - DINESH VORA
DIWALI GREETINGS FROM SHREE NARENDRA MODI - DINESH VORA
 
GovindR&D ConferencePresentationOct2015pdf
GovindR&D ConferencePresentationOct2015pdfGovindR&D ConferencePresentationOct2015pdf
GovindR&D ConferencePresentationOct2015pdf
 
Acnl2015 arjen uittenbogaard-echt agile veranderen
Acnl2015 arjen uittenbogaard-echt agile veranderenAcnl2015 arjen uittenbogaard-echt agile veranderen
Acnl2015 arjen uittenbogaard-echt agile veranderen
 
1 Million Favors & Counting
1 Million Favors & Counting1 Million Favors & Counting
1 Million Favors & Counting
 
Asso%20artchimade
Asso%20artchimadeAsso%20artchimade
Asso%20artchimade
 
bezar_pitch_catalog_FINAL_nocrops
bezar_pitch_catalog_FINAL_nocropsbezar_pitch_catalog_FINAL_nocrops
bezar_pitch_catalog_FINAL_nocrops
 
Target audience (isobel ellen sanger)
Target audience (isobel ellen sanger)Target audience (isobel ellen sanger)
Target audience (isobel ellen sanger)
 
9 урок теорія робочий стіл
9 урок теорія робочий стіл9 урок теорія робочий стіл
9 урок теорія робочий стіл
 
OMC Brasil China
OMC Brasil ChinaOMC Brasil China
OMC Brasil China
 

Similar to SSN-TC workshop talk at ISWC 2015 on Emrooz

Apache Lens at Hadoop meetup
Apache Lens at Hadoop meetupApache Lens at Hadoop meetup
Apache Lens at Hadoop meetup
amarsri
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
ScyllaDB
 
Building Scalable Semantic Geospatial RDF Stores
Building Scalable Semantic Geospatial RDF StoresBuilding Scalable Semantic Geospatial RDF Stores
Building Scalable Semantic Geospatial RDF Stores
Kostis Kyzirakos
 
WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)
WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)
WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)
DataStax Academy
 
Hybrid acquisition of temporal scopes for rdf data
Hybrid acquisition of temporal scopes for rdf dataHybrid acquisition of temporal scopes for rdf data
Hybrid acquisition of temporal scopes for rdf data
Anisa Rula
 
What we do to improve scalability in our RDF processing system
What we do to improve scalability in our RDF processing systemWhat we do to improve scalability in our RDF processing system
What we do to improve scalability in our RDF processing system
Alejandro Llaves
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
PlanetData Network of Excellence
 
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
Oscar Corcho
 
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Lucidworks
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series Analysis
QAware GmbH
 
Querying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern FragmentsQuerying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern Fragments
Ruben Verborgh
 
Virtual Science in the Cloud
Virtual Science in the CloudVirtual Science in the Cloud
Virtual Science in the Cloudthetfoot
 
A look ahead at spark 2.0
A look ahead at spark 2.0 A look ahead at spark 2.0
A look ahead at spark 2.0
Databricks
 
Making sense of your data
Making sense of your dataMaking sense of your data
Making sense of your data
Gerald Muecke
 
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLModeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLKostis Kyzirakos
 
Sensors, Mappings and Queries in the Semantic Web
Sensors, Mappings and Queries in the Semantic WebSensors, Mappings and Queries in the Semantic Web
Sensors, Mappings and Queries in the Semantic WebJean-Paul Calbimonte
 
Representing and Querying Geospatial Information in the Semantic Web
Representing and Querying Geospatial Information in the Semantic WebRepresenting and Querying Geospatial Information in the Semantic Web
Representing and Querying Geospatial Information in the Semantic WebKostis Kyzirakos
 
Environment Canada's Data Management Service
Environment Canada's Data Management ServiceEnvironment Canada's Data Management Service
Environment Canada's Data Management Service
Safe Software
 
Spark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer Agarwal
Spark Summit
 
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit
 

Similar to SSN-TC workshop talk at ISWC 2015 on Emrooz (20)

Apache Lens at Hadoop meetup
Apache Lens at Hadoop meetupApache Lens at Hadoop meetup
Apache Lens at Hadoop meetup
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Building Scalable Semantic Geospatial RDF Stores
Building Scalable Semantic Geospatial RDF StoresBuilding Scalable Semantic Geospatial RDF Stores
Building Scalable Semantic Geospatial RDF Stores
 
WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)
WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)
WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)
 
Hybrid acquisition of temporal scopes for rdf data
Hybrid acquisition of temporal scopes for rdf dataHybrid acquisition of temporal scopes for rdf data
Hybrid acquisition of temporal scopes for rdf data
 
What we do to improve scalability in our RDF processing system
What we do to improve scalability in our RDF processing systemWhat we do to improve scalability in our RDF processing system
What we do to improve scalability in our RDF processing system
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
 
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
 
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series Analysis
 
Querying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern FragmentsQuerying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern Fragments
 
Virtual Science in the Cloud
Virtual Science in the CloudVirtual Science in the Cloud
Virtual Science in the Cloud
 
A look ahead at spark 2.0
A look ahead at spark 2.0 A look ahead at spark 2.0
A look ahead at spark 2.0
 
Making sense of your data
Making sense of your dataMaking sense of your data
Making sense of your data
 
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLModeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
 
Sensors, Mappings and Queries in the Semantic Web
Sensors, Mappings and Queries in the Semantic WebSensors, Mappings and Queries in the Semantic Web
Sensors, Mappings and Queries in the Semantic Web
 
Representing and Querying Geospatial Information in the Semantic Web
Representing and Querying Geospatial Information in the Semantic WebRepresenting and Querying Geospatial Information in the Semantic Web
Representing and Querying Geospatial Information in the Semantic Web
 
Environment Canada's Data Management Service
Environment Canada's Data Management ServiceEnvironment Canada's Data Management Service
Environment Canada's Data Management Service
 
Spark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer Agarwal
 
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
 

Recently uploaded

SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
Hironori Washizaki
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 

Recently uploaded (20)

SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 

SSN-TC workshop talk at ISWC 2015 on Emrooz

  • 1. First Joint International Workshop on Semantic Sensor Networks and Terra Cognita October 11, 2015, Bethlehem, PA, USA Emrooz: A Scalable Database for SSN Observations Markus Stocker, Narasinha Shurpali, Kerry Taylor, George Burba, Mauno Rönkkö, Mikko Kolehmainen markus.stocker@uef.fi @markusstocker and @envinf
  • 2. 2 Introduction Expressive ontologies for sensor (meta-) data (SSN) Flexible graph data model (RDF) Triple stores obvious choice Unfortunately hardly viable at scale Triple stores indexes for graph pattern queries Not designed for time series interval queries
  • 3. 3 Aim Build a database that ... Consumes SSN observations in RDF Evaluates SSN observation SPARQL queries Scales to billions of observations Has better query performance than triple stores
  • 5. 5 Cassandra data model Schema consisting of Partition key (row key) of type ascii Clustering key (column name) of type timeuuid Column value of type blob The partition key consists of two (dash-concatenated) parts SHA-256 hex string digest of sensor-property-feature URIs Date time string of pattern yyyyMMddHHmm Computed from observation result time Floor-rounded to year, month, day, hour, or minute Rounding depends on sensor sampling frequency Goal is to limit the number of columns per row Clustering key determined by observation result time Column value is set of triples for observation (binary)
  • 6. 6 Experiments LI-7500A Open Path CO2/H2O Gas Analyzer LI-7700 Open Path CH4 Analyzer Property of mole fraction Three features for the monitored gases
  • 7.
  • 8. 8 Experiments January 7 to May 26, 2015, 6045 GHG archive files Estimated # of sensor observations is 326 430 000 Estimated # of triples is 4.9 billion (15 triples / observation) Load and query performance on 10 subsets SPARQL query with 10 min interval Compared to Stardog and Blazegraph Test performance with varying time interval
  • 9. 9 The query select ?time ?value where { [ ssn:observedBy licor:LERS-75H-2035 ; ssn:observedProperty sweet-propFraction:MoleFraction ; ssn:featureOfInterest sweet-matrCompound:CO2 ; ssn:observationResultTime [ time:inXSDDateTime ?time ] ; ssn:observationResult [ ssn:hasValue [ dul:hasRegionDataValue ?value ] ] ] filter (?time >= "2015-04-15T00:00:00.000+06:00"^^xsd:dateTime && ?time < "2015-04-15T00:10:00.000+06:00"^^xsd:dateTime) } order by asc(?time)
  • 10. 10 Results: Some figures Subset Observations Triples Distinct 30 m 54 000 810 000 648 007 1 h 108 000 1 620 000 1 296 007 3 h 324 000 4 860 000 3 888 007 6 h 647 997 9 719 955 7 775 971 12 h 1 295 997 19 439 955 15 551 971 1 d 2 591 994 38 879 910 31 103 935 7 d 18 140 271 272 104 065 217 683 259 1 M 72 526 464 1 087 896 960 870 317 575 3 M 194 188 107 2 912 821 605 * J-M 328 715 445 4 930 731 675 *
  • 11. 11 Results: Load performance 10 100 1000 10000 100000 1000000 30 m 1 h 3 h 6 h 12 h 1 d 7 d 1 m 3 m J-M Time(logscale)[s] Subsets Emrooz Blazegraph Stardog
  • 12. 12 Results: Query performance 10 100 1000 30 m 1 h 3 h 6 h 12 h 1 d 7 d 1 m 3 m J-M Time(logscale)[s] Subsets Emrooz Blazegraph Stardog
  • 13. 13 Results: Query size performance 0 1 2 3 4 5 6 7 8 9 10 1 s 30 s 1 m 5 m 10 m 20 m 30 m 40 m 50 m 60 m Time[s] Query time interval Emrooz
  • 14. 14 REST curl http://localhost:8080/sensors/list curl http://localhost:8080/properties/list curl http://localhost:8080/features/list curl -H "Accept: application/json" http://localhost:8080/sensors/list curl -H "Accept: text/csv" -G --data-urlencode sensor=http://example.org#thermometer --data-urlencode property=http://example.org#temperature --data-urlencode feature=http://example.org#air --data-urlencode from=2015-04-21T01:00:00.000+03:00 --data-urlencode to=2015-04-21T02:00:00.000+03:00 http://localhost:8080/observations/sensor/list
  • 15. 15 R host <- "http://localhost:8080" df.sensors <- read.csv(text=getURL(paste0(host, "/sensors/list")), header=FALSE, col.names=c("sensor")) df.sensors sensor 1 http://licor.com#LERS-75H-CH4 2 http://licor.com#LERS-75H-CO2
  • 16. 16 R host <- "http://localhost:8080" sensor <- "http://licor.com#LERS-75H-CO2" property <- "http://sweet.jpl.nasa.gov/2.3/propMass.owl#Density" feature <- "http://sweet.jpl.nasa.gov/2.3/matrCompound.owl#CarbonDioxide" from <- "2015-01-07T00:00:00.000+06:00" to <- "2015-01-07T00:01:00.000+06:00" url <- paste0(host, "/observations/sensor/list?", "sensor=", curlEscape(sensor), "&property=", curlEscape(property), "&feature=", curlEscape(feature), "&from=", curlEscape(from), "&to=", curlEscape(to)) df.observations <- read.csv(text=getURL(url, httpheader=c(Accept="text/csv")), header=TRUE, sep=",") ggplot(data=df.observations, aes(time, value)) + geom_line() + xlab("Time") + ylab("CO2 [mmol m-3]")
  • 17. 17 R 18.00 18.05 18.10 18.15 00 15 30 45 00 Time CO2[mmolm−3]
  • 18. 18 Related and future work Other authors have pointed out the problem “Semantification of measurement data not promising” RDF databases on NoSQL systems (e.g. Cumulus RDF) Support for QB observations (done) REST API (preliminary) Integration with R/Matlab (preliminary) Performance comparison with other systems
  • 19. 19 Conclusion SSN and RDF nice for sensor (meta-) data Triple stores inadequate for observation data Alternative approaches required What are the advantages and disadvantages? Reasoning on all data by some sensor? Query for observation values exceeding threshold?