Phd

Ontology-based Access to
Sensor Data Streams
Jean-Paul Calbimonte
Supervisor: Oscar Corcho
Ontology Engineering Group
Facultad de Informática, Universidad Politécnica de Madrid
jp.calbimonte@upm.es
PhD Thesis Defense
18.4.2013

2
Outline
Motivation
Background
Conclusions
Semantic stream query processing
Sensor metadata characterization
Ontology-based Access to Sensor Data Streams
Hypotheses & contributions
Challenges

Motivation
3
from Sensor Networks
to the Sensor Web
and the Semantic Sensor Web

Sensors
4
http://www.flickr.com/photos/wouterh/2409251427/
data capture
different Sensor providers
transmission
. . .. . .
data streams

Sensor Networks and the Web
5
Sensor Networks
users
applications
data streams
Volume
Velocity
Variety WEB
Universal Web-based access to Sensor data

Querying the semantic sensor Web
6
e.g. publish sensor data as RDF/Linked Data?
URIs as names of things
HTTP URIs
useful information when URI
is dereferenced
Link to other URIs
users
applications
WEB
Use ontology models to continuously query real-
time data streams originated from sensors?
1
static vs. streams
one-off vs. continuous

Research questions & hypotheses
7
Ontology models to query real-time sensor data streams?
Access heterogeneous SPEs using ontologies as an
overarching data model?
SPARQL streaming extensions for querying data from SPEs
(stream processing engines)?
1
H1: Sensor streaming data  instances of an ontology model
H2: SPARQL extensions  streaming operators & continuous processing
H3: Ontology-based streaming queries  rewritten to relational-based
queries using mappings
H4: Ontology-based streaming queries  abstract expressions
 concrete executable SPE queries
H5: Query rewriting  Pull & Push delivery  acceptable overhead

Sensor Data: Observations
Citizen Science
Multiple publishers
Heterogeneity
Metadata quality
8

Characterizing semantic sensor metadata
10
users
applications
WEB
Characterizing sensor data, deriving semantic
metadata from the sensor observations
2
different publishers
different metadata
publish streams
Search/query relevant
data sources?
GSN

Research questions & hypotheses
11
Data representation suitable for extracting data features
that characterize a set of sensor streams?
Classification and mining techniques to characterize
sensor data streams?
2
H6: Sensor data series  find characteristic patterns
make it recognizable among other types
H7: Slope representations  semantic properties such as the type of data
 learned with classification techniques
 acceptable precision

Contributions
12
 SPARQL extensions & formalization
 rewriting to algebra expressions
 using declarative mappings
 results data translation
 query evaluation pluggable to ≠ SPEs
 query rewriting using R2RML mappings
 data representation as slope distributions
 characterize types of sensor data
 classifying sensor time series
 extract metadata features
 derive semantic properties & R2RML
SPARQLStream
QueryingMetadata
2
1

Limitations
13
L1: Rewriting  medium sampling throughput, e.g. Env. monitoring
L2: Query expressivity  is limited to underlying SPEs’.
L3: Adapters  implemented for custom sources.
L4: Querying  only simple entailment
L5: Arbitrarily noisy sensor series  no accurate characterization.
L6: Classification  number of sensor time series in training set
L7: Data characterization is not computed in real-time, but offline

14
Outline
Motivation
Background
Conclusions
Challenges
Data Streams Continuous queries Window
SPEs Ontology-based data access

Sensor data streams & events
15
(temp,hum,pres) τi
(36.2,89,4) τimilford1
(35.6,87,4) τi-1
(37.2,88,4) τi+1
watford7
. . .
(37.6,88,7) τi (36.3,89,2) τi+1
. . .
. . .
stream tuples
event processing

Querying streams & events
16
w1 w2
windows
SELECT attribute FROM stream [NOW -10 MIN]
streaming tuples
Query
processor
query results
database
Continuous
query
processor
query
push
results
pull
request
SPE
continuous processingone-off queries

Stream Processing Engines (SPE)
17
Data Stream Management Systems (DSMS)
Complex Event Processors (CEP)
Sensor Data Middleware
CQL/Strea
m
Borealis
TelegraphCQ
StreamMill
Cayuga
GEM CEDR
NiagaraCQ
Rapide
CosmHourglass
SStreamWare GSN
IBM InfoSphere
Sybase CEP
Microsoft StreamInsight
Oracle CEP
Esper
StreamBase
Diverse query languages
Different query capabilities
Different query models

Extracting data from relational databases
18
WEB
Ontology-based
data access
one-off SPARQL
queries
data as RDF
relational database
RDB to RDF
mappings
static data
D2R
Morph
ODEMapster Triplify
UltraWrap Mastro
R2RML
W3C SSN Ontology

Summary
19
Existing SPEs available and producing data streams
Ontology-based access only for stored data
SPARQL query language not suitable for streams
SPEs are highly heterogeneous in models and queries

20
Outline
Motivation
Background
Conclusions
SPARQLStream
Challenges
Query rewritingRDF Stream
Mappings using R2RML Execution over SPEs

RDF Streams
21
s,p,o
<aemet:observation1, qudt:hasNumericValue, “15.5”>
<aemet:observation1, ssn:observedBy, aemet:Sensor3>
For streams?
( s,p,o ,τ)
(<aemet:observation1, qudt:hasNumericValue, “15.5”>,34532)
timestamped triples
• Gutierrez et al. (2007) Introducing time into RDF. IEEE TKDE
• Rodríguez et al. (2009) Semantic management of streaming data. SSN

SPARQLStream extensions
22
SELECT (MAX(?temperature) AS ?maxtemp) ?sensor
WHERE {
?obs ssn:observedBy ?sensor.
?obs ssn:observationResult ?res.
?res aemet:hasAirTemperatureValue ?val.
?val qu:numericValue ?temperature.
}
GROUP BY ?sensor
SELECT (MAX(?temp) AS ?maxtemp) ?sensor
FROM NAMED STREAM <http://aemet.linkeddata.es/observations.srdf> [NOW-1 HOURS]
WHERE {
?obs ssn:observedBy ?sensor.
?obs ssn:observationResult ?res.
?res aemet:hasAirTemperatureValue ?val.
?val qu:numericValue ?temp.
}
GROUP BY ?sensor
SPARQLStream
Named streams
Time windows
Other approaches: Streaming SPARQL (2008), C-SPARQL (2009), CQELS
(2011), EP-SPARQL (2011), INSTANS (2012)

Streaming SPARQL execution approaches
23
Extend RDF for streaming data
Extend SPARQL for streaming RDF
Use a SPE internally for evaluation
Query rewriting to SPEs
RDF Streaming engine from scratch
Logic-programming based query evaluation
~Similarities
Divergence
streams
DSMSs
CEPs
Middleware
SPARQLStream

Mapping SPE schemas and ontologies
24
wan7
timed: datetime PK
sp_wind: float
timed sp_wind
1 3.4
2 5.6
3 11.2
4 1.2
5 3.1
.. …
Queries
SELECT sp_wind
FROM wan7 [NOW -5 HOUR]
WHERE sp_wind >10
SPE
SPE data schemas
ssn:Observation
Ontology models
SPARQLStream Queries
Stream-to-ontology
mappings
SELECT ?wspeed
FROM STREAM <SensorReadings.srdf> [NOW–5 HOUR]
WHERE {
?obs a ssn:ObservationValue;
qudt:numericalValue ?wspeed;
FILTER (?wspeed>10) }

http://swissex.ch/data#
Wan7/WindSpeed/ObsValue{timed}
sp_wind
Wan7/WindSpeed/Observation{timed}
Wan7/ WindSpeed/
ObsOutput{timed}
sweetSpeed:WindSpeed
Creating Mappings
25
wan7
timed: datetime PK
sp_wind: float
ssn:ObservationValue
qudt:numericValue
xsd:decimal
ssn:SensorOutput
ssn:Observation
ssn:hasValue
ssn:observationResult
ssn:Property
ssn:observedProperty
:Wan4WindSpeed a rr:TriplesMapClass; rr:tableName "wan7";
rr:subjectMap [
rr:template "http://swissex.ch/data#Wan7/WindSpeed/ObsValue/{timed}";
rr:class ssn:ObservationValue; rr:graph ssg:swissexsnow.srdf ];
rr:predicateObjectMap [ rr:predicateMap [ rr:predicate qudt:numericValue ];
rr:objectMap [ rr:column "sp_wind” rr:datatype
xsd:decimal]];.
W3C R2RML Mapping Language

Query rewriting
SELECT ?windspeed
FROM STREAM <http://ssg4env.eu/SensorReadings.srdf>
[NOW–5 HOUR TO NOW]
WHERE {
qudt:numericalValue ?windspeed;
FILTER (?windspeed>10) }
SELECT sp_wind FROM wan7 [FROM NOW-5 HOURS TO NOW]
WHERE sp_wind >10
timed,
sp_wind
π
ω
σsp_wind>10
5 Hour
wan7
SELECT sp_wind FROM wan7.win:time(5 hour)
WHERE sp_wind >10
http://montblanc.slf.ch:22001/multidata?vs[0]=wan7&
field[0]=wind_speed_scalar_av&c_min[0]=10&
from=15/05/2012+05:00:00&to=15/05/2012+10:00:00
http://api.cosm.com/v2/feeds/14321/datastreams/4?start=2012-05-
15T05:00:00Z&end=2012-05-15T10:00:00Z
Query
rewriting
R2RM
L
SNEE (DSMS)
Esper (DSMS)
GSN (middlwr)
Cosm(middlwr)
26
H4: Ontology-based streaming queries
 abstract expressions
H3: Ontology-based streaming queries
 rewritten to relational-based
SPARQLStream

Ontology-based query rewriting
27
Query
rewriting
Query
Processing
Client
SPARQLStream
[tuples]
[triples/bin
dings]
Algebra
expression
R2RML
Mappings
SPARQLStream query processing
SELECT ?windspeed
FROM STREAM <http://ssg4env.eu/SensorReadings.srdf>
[NOW–5 HOUR]
WHERE {
qudt:numericalValue ?windspeed;
FILTER (?windspeed>10) }
SELECT sp_wind
FROM wan7.win:time(5 hour)
WHERE sp_wind >10
π timed,sp_wind
ω
σsp_wind>10
5 Hour
wan7
Data
translation
SNEE
Esper
GSN
Cosm
pull/push
https://github.com/jpcik/morph-streams
Other
H1: Sensor streaming data
 instances of an ontology model
H2: SPARQL extensions  streaming
operators & continuous processing

Evaluation of query rewriting overhead
28
H5: Query rewriting
 Pull & Push delivery
 acceptable overhead
Native execution w/o rewriting
Execution with rewriting
Pull & Push delivery
End-to latency
Adapted Esper benchmark

29
Outline
Motivation
Background
Conclusions
Representation
Challenges
Classification Metadata

Characterizing semantic sensor metadata
30
WEB
GSN
Air Pressure?
Air Temperature?
Already classified time series
Unclassified input series
compare

Deriving Semantic Metadata
31
Representation
Classification
Metadata

0 1 2 3 4 5 6 7 8 9 10
3.65
3.7
3.75
3.8
3.85
3.9
3.95
4
4.05
4.1
0 1 2 3 4 5 6 7 8 9 10
3.7
3.75
3.8
3.85
3.9
3.95
4
4.05
4.1
Piecewise Linear Approximation
32
Reflect data trends
Apply with different resolutions
Applicable for different rates
Online computation cheap
Linear segments
Time series
time
Reduce numerosity

Linear Approximations
33
a
d
a
c
0
π/2
-π/4
π/4
a
b
c
d
Key: segment slopes (angles)
Divide the angle space in sectors
distribution of angles in training set
compute linear approximation
compute slope distribution
K-nearest neighbor classification
2
1
3

Experiments SwissEx
Confusion matrix SwissEx
Training-Test datasets
SwissExperiment AEMET
34

Experiments AEMET
Confusion matrix AEMET
H6: Sensor data series
 find characteristic patterns
 make it recognizable among other types
35
Classification according to type
FPs on subclasses of the same property

Evaluation vs SAX
36
H7: Slope representations
 type of data: semantic property
 learned through classification

Semantic Sensor Metadata
swissex:Sensor1
rdf:type ssn:Sensor;
ssn:onPlatform swissex:Station1;
ssn:observes cf-property:wind_speed.
swissex:Sensor2
rdf:type ssn:Sensor;
ssn:onPlatform swissex:Station1;
ssn:observes cf-property:air_temperature.
37
station1
W3C SSN Ontology
Derive semantic metadata properties
cf-property:wind_speed rdf:type dim:VelocityOrSpeed;
rdfs:label "wind speed";
ssn:isPropertyOf cf-feature:wind;
qu:propertyType qu:scalar;
qu:generalQuantityKind qu:speed.
Raw sensor data Semantic metadata

38
Outline
Motivation
Background
Conclusions
Challenges

Conclusions
H1: Sensor streaming data  instances of an ontology model
H2: SPARQL extensions  streaming operators & continuous processing
H3: Ontology-based streaming queries  rewritten to relational-based
Mapping sensor data to ontology instances, e.g. SSN Ontology
SPARQLStream  data model, extensions syntax, semantics
SPARQLStream  semantics of query rewriting to relational steaming
algebra
 usage of declarative mappings (W3C R2RML)
Calbimonte, Corcho & Gray. Enabling ontology-based access to streaming data sources. ISWC 2010
Gray, García-Castro, Kyzirakos, Karpathiotakis, Calbimonte, Page et al. A semantically enabled service
architecture for mashups over streaming and stored data. ESWC 2011
Gray, Sadler, Kit, Kyzirakos, Karpathiotakis, Calbimonte, Page, García-Castro, et al. A semantic sensor
web for environmental decision support applications. Sensors, MDPI, 2011
Calbimonte, Corcho & Gray. Ontology-based Access to Streaming Data. In Posters ESWC 2010
39

Conclusions
40
H4: Ontology-based streaming queries  abstract expressions
Instantiate, execute  ≠ SPEs: SNEE (DSMS), Esper (CEP), GSN & Cosm (Middlwr)
 Available implementation
 application in different domains
H5: Query rewriting  Pull & Push delivery  evaluation overhead
SPARQLStream  evaluation overhead wrt. native execution
Push & pull delivery evaluation
Calbimonte, Jeung, Corcho & Aberer. Enabling Query Technologies for the Semantic Sensor Web. IJSWIS 2012.
Calbimonte & Corcho. Evaluating SPARQL Queries over RDF Streams. Linked Data Management: Principles
and Techniques, CRC Press, 2013 (under review)
Zhang, Duc, Corcho & Calbimonte. SRBench: A Streaming RDF/SPARQL Benchmark. ISWC 2012.
Ruckhaus, Calbimonte, García-Castro & Corcho. Short Paper: From Streaming Data to Linked Data–A Case
Study with Bike Sharing Systems. ISWC SSN 2012

Conclusions
41
H6: Sensor data series  analyze in order to find characteristic patterns
make it recognizable among other types
H7: Slope representations  semantic properties such as the type of data
 learned with classification techniques
 acceptable precision
41
Raw observations analysis  slope distribution representation
 compared with SoA representations i.e. SAX
Evaluation of classification task  real world datasets AEMET, SwissEx
 in presence of noisy data
 deriving semantic metadata
Calbimonte, Yan, Jeung, Corcho & Aberer. Deriving Semantic Sensor Metadata from Raw Measurements.
ISWC SSN 2012
Calbimonte, Jeung, Corcho, & Aberer. Semantic Sensor Data Search in a Large-Scale Federated Sensor
Network. ISWC SSN 2011

Future directions
42
WEB
SPARQLStream queries
Publishing Linked Stream Data
Currently static
SPARQL streaming
standards
Dereferencing streaming
data
Query Federation
Distributed sensor data
Static and streaming sources
Stream Reasoning
query rewriting, expanding queries
Expresiveness
Integrate with the Web of Data
Inferencing

Future directions
WEB
Sensor pattern classification
Combine with query
processing
Live data classification
Statistical & quality analysis Integrate statistic analyisis
Mappings to statistical models
Data quality filtering
Parallel Massive Stream Processing Online stream analysis
Scalable stream processing
S4, Storm, Streamcloud
Heterogeneity
43

Ontology-based Access to
Sensor Data Streams
Jean-Paul Calbimonte
Supervisor: Oscar Corcho
Ontology Engineering Group
Facultad de Informática, Universidad Politécnica de Madrid
18.4.2013
jp.calbimonte@upm.es
PhD Thesis Defense

SSN Ontology with other ontologies
46
W3C SSN Ontology
tool for modeling our sensor data
combine with domain ontologies

Algebra construction
47
timed,
sp_wind
π
ω
σ sp_wind>10
5 Hour
wan7windsensor1 windsensor2

Static optimization
48
timed,
sp_wind
π
ω
σ sp_wind>10
5 Hour
wan7
timed,
windvalue
π
ω
σ windvalue>10
5 Hour
windsensor1
timed,
windvalue
π
ω
σ windvalue>10
5 Hour
windsensor2

SPARQL Streaming extensions
49

RDF Streams and SPARQLStream
52
RDF Stream
Time window
Window-Stream

Mappings
53
Subject, predicate, object
Given a triple pattern t p = (sp, pp,op), the semantics of its evaluation over a
lational streams referenced by a set of mappings M , is given by eval (t p,M), wh
n algebra expression deﬁned as:
eval (t p,M) = ρf s→sp,f p→pp,f o→opπf s,f p,f o(s)
where ρ is the relational rename operation and π is the relational projection
on. s is the stream referenced by the mapping µ = f i ndM appi ng(t p,M) and f s
,
e the functions of µ that generate the projection expressions for producing respec
e subject, predicate and object, for every tuple of s.
For the previous example, the evaluation of t p1 is given by:
eval (t p1,M) = ρf s→sp,f p→pp,f o→opπf s
µ1
(s1.ts),f
p
µ1
(),f o
µ1
()(s1)
The resulting algebra expression projects the s1.ts attribute, applying the f s
on to create the subject. The functions f
p
µ1
and f o
µ1
in this case are constants,
edicate and object are the same for all tuples of s1. For the evaluation of more co
Evaluate query

Rewrite to algebra
54
Then, the evaluation of gp can be represented as the following algebra expression:
eval (t p,M) = ωts,te,δ πf s
µ1
(s1) ✶ πf s
µ2
,f o
µ2
(s1) ✶ πf s
µ4
,f o
µ4
(s1) ✶πf s
µ5
,f o
µ5
(s1)
This expression can be represented as a tree (Figure 4.1), where the leaf nodes are the
streams and the other nodes are the relational streaming operators.
Figure 4.1: Tree representation of the evaluation of a SPARQL Stream query rewritten as an alge-
bra expression.
eval (t p, M ) = ωts,te,δ πf s
µ1
(s1) ✶ πf s
µ2
,f o
µ2
(s1) ✶ πf s
µ4
,f o
µ4
(s1) ✶πf s
µ5
,f o
µ5
(s1)
This expression can be represented as a tree (Figure 4.1), where the leaf nodes are th
streams and the other nodes are the relational streaming operators.
Figure 4.1: Tree representation of the evaluation of a SPARQL Stream query rewritten as an alg
bra expression.

Rewriting and Execution Process
55

SRBench Datasets
real-world U.S. weather data1
first & largest sensor dataset in LOD
57
LinkedSensorData
LinkedSensorMetadata LinkedObservationData
~20k US weather stations, ~100k sensors
links to locations in GeoNames nearby
hurricane & blizzard observations in US
~1.73 billion RDF triples
~159 million observations
1 http://mesowest.utah.edu
Name Storm Type Date #Triples #Observations Data size
Bill Hurricane Aug. 17 – 22, 2009 231,021,108 21,272,790 ~15 GB
Ike Hurricane Sep. 01 – 13, 2008 374,094,660 34,430,964 ~34 GB
Gustav Hurricane Aug. 25 – 31, 2008 258,378,511 23,792,818 ~17 GB
Bertha Hurricane Jul. 06 – 17, 2008 278,235,734 25,762,568 ~13 GB
Wilma Hurricane Oct. 17 – 23, 2005 171,854,686 15,797,852 ~10 GB
Katrina Hurricane Aug. 23 – 30, 2005 203,386,049 18,832,041 ~12 GB
Charley Hurricane Aug. 09 – 15, 2004 101,956,760 9,333,676 ~7 GB
Blizzard Apr. 01 – 06, 2003 111,357,227 10,237,791 ~2 GB

SRBench Queries
58
graph pattern matching
solution modifier
query form
SPARQL 1.1
reasoning
streaming
data access
and, filter, union, optional
projection, distinct
select, construct, ask
aggregate, subquery
subclass, subproperty, sameAs
time window, istream
observations, sensor metadata
geonames, dbpedia
select expr, property path
dstream, rstream
17queries

Query Features
59
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17
1.Graph pattern
matching
A A,F,O A A,F A A,F,U A A A A A,F A,F,U A,F A,F,U A,F A,F A,F
2. Solution modifier P,D P,D P P P P P,D P P P,D P,D P P P,D P P P
3. Query form S S A S C S S S S S S S S S S S S
4. SPARQL 1.1 F,P A A,E,M
,F
A,S N A,E,M A,E,M A,S,M
,F
A,S,E,
M,F,P
A,E,M
,F,P
F,P A,E,M
,P
P P
5. Reasoning C R C A C
6. Streaming T T T T T T T,D T T T T T T T T
7. Dataset O O O O O O O O,S O,S O,S O,S O,S,G O,S,G O,S,G O,S,D O,S,G
,D
S
1. And, Filter, Union, Optional
2. Projection, Distinct
3. Select, Construct, Ask
4. Aggregate, Subquery, Negation, Expr in SELECT, assignMent,
Functions&operators, PropertyPath
5. subClassOf, subpRopertyOf, owl:sameAs
6. Time-based window, Istream, Dstream,Rstream
7. LinkedObservationData, LinkedSensorMetadata, GeoNames, Dbpedia

Phd

Recommended

Recommended

More Related Content

Similar to Phd

Similar to Phd (20)

More from Jean-Paul Calbimonte

More from Jean-Paul Calbimonte (20)

Recently uploaded

Recently uploaded (20)

Phd