Myriam phd

Contextualise Sensors with Linked Data
To Improve Relevancy, Data Quality and Network
Adaptability
Myriam Leggieri
PhD Thesis

Contextualise Sensors with Linked Data
3
dB, Km, µPa?
dBs in water have a different relative value than in air

Q1. How to model it?
—> Is it worth it?
4
4
dB,dB, Km, µPa?
Yes, because
Q1. Contextualised Model for
Q3. Relevancy Prediction
Q4. Enriched Web Content
Q5. Network Adaptability
Q2. Cross-Network Communication

Research Questions
5
Q1. How to model
Linked Sensor Data
for
Q2. Cross-Network
Communication
Q3. Relevancy
Prediction
Q4. Web Data
Quality
Q5. Network
Adaptability

Outline
1. Linked Sensor Data Model [Q1]
2. LD4Sensors Web Service [Q2]
3. Sensor Relevancy Prediction [Q3]
4. Enriched Web Content [Q4]
5. Network Adaptability [Q5]
6. Research Answers
7. Lessons Learned and Future Work
6
Core Research
and
Results
Conclusion
Q1. How can contextual information
be used to enrich sensor data?

Linked Sensor Data Model
7
Ontology
Modularisation
Context
Network
Components
Energy
Conservation

8
Application
Ontology
Domain Ontology Task Ontology
Upper Ontology
Ontology Aligning
Inheritance & Reuse
Dolce+DnS Ultralite(DUL)
W3C Semantic Sensor Network (SSN)
Our Ontology
Quantities, Units, Dimensions
and Data Types (QUDT)

9
Provenance (PROV)
Event Model-F (EVENT)
Unified Code for Units of Measure (UCUM)
Friend Of A Friend (FOAF)
Measurement Unit (MUO)
Online Presence (OPO)
Review Vocabulary (REV)
Ontology Aligning
Inheritance & Reuse

10
spt:Agent
spt:Activity
ssn:Device
ssn:Sensor
EventParticipation
ssn:Stimulus
spt:Place

11
OWL Full
Symmetric
Transitive Inverse
Equivalent
room A floor 2 house H
spt:containedIn
Asserted, Inferred, Direct
Relations

12
Comment
&
Rate
from
to
title
date-time
link motivation
same thing
same domain
same date-time
same location
Social Feedback
and
Sharing

Outline
6. Research Answers
13
Core Research
and
Results
Conclusion
Q1. How can contextual information
be used to enrich sensor data?
Q2. How can sensors communicate
across different platforms without ad-
hoc solutions?

14
Provenance (PROV)
Event Model-F (EVENT)
Unified Code for Units of Measure (UCUM)
Friend Of A Friend (FOAF)
Measurement Unit (MUO)
Online Presence (OPO)
Review Vocabulary (REV)
Which ontologies?
Which links? How to enable inference?
How to enable
cross-communication?
Non-experts
Average users

Non-experts
Average users
Automate
Inference
Automate
Reasoning
Automate
Link Search &
Creation
Easy Browsing
REST API
GUI
LD4Sensors (LD4S) Web Service
Easy Storing

16
LD4Sensors (LD4S) Web Service

17
LD4S: Evaluation
Goals
1. Actual Gain in
Building Automation
2. Uptake,
Usability,Utility
Performance
3. Implementation
Quality
Users
feedback
Deployment

18
LD4S: Evaluation
1. Actual Gain in
Building Automation
Deployment
80%Accuracy
Matching the real
consumption

19
LD4S: Usability Evaluation
2. Uptake,
Usability,Utility
Users
feedback
Tot. Participants: 38
1% had previously
interacted with sensors
Survey
GUI Usable and clear
API to be improved
Applicability to be made explicit

20
LD4S: Utility Evaluation
Time of usage
# Data accessed
# Data transmitted
Amount
Type
Uniqueness
Location
Quality
Time Sensitivity
Relevance
Web Service Resources Linked Data output
Purpose to be made more explicit
Links relevancy to purpose to be improved
Highlight importance of network/context metadata

21
LD4S: Uptake Evaluation
Unique accesses
Tot # accesses
Per day (over the 30 days period)
2. Uptake,
Usability,Utility
Users
feedback
Satisfying for pilot evaluation
Project-driven modelling: positive feedback from partners
To be repeated over a longer time period

22
LD4S: Performance Evaluation
Threshold = # requests / response time (sec)
compared to
Payload size sent + received by LD4S
Performance
3. Implementation
Quality Decrease of throughput as payloads increases
But not exponential
Improvable by implementing a cache

Outline
6. Research Answers
23
Core Research
and
Results
Conclusion
Q2. How can sensors communicate
across different platforms without ad-
hoc solutions?
Q3. How to identify which sensors are more
relevant sources of information to define a
specific small scope of interest?

24
Relevancy Prediction
in Activity Logging

25
Lexical
Realisation
Conceptual
Objects Cooking
Concept
Objects
Fridge
Microwave
Oven
Sink
...
Locations
Kitchen
...
Fridge
13
12
11
10
9
8
7
6
5
4
3
2
L
5V
A0
ANALOGIN
AREF
1
GND
TX
RX
RESET
3V3
A1
A2
A3
A4
A5
VIN
GND
GND
DIGITAL(PWM=)
Arduino
TM
IOREF
ICSP
ICSP2
ON
POWER
0
1TX0
RX0
RESET
Sensor
<switch,
fridge>
in Activity Logging
Distributional
Semantics
Hierarchical
Clustering
Feature
of interest
(FoI)

26
in Activity Logging
1. from DataHub:
Algorithm
Sensors sharing
Location & Time
Activated
Sensors
2. EasyESA Similarity (X,Y)
X
Y
3. Add to Distance Matrix
4. Clustering
Sensors in the same cluster are relevant for the same activity.
Activity = Cluster

Predicting Sensor Relevancy for ADLs Logging
of the rows corresponds to a word that occurs in
S
i=1...n di. An e
corresponds to the TF-IDF value of term ti in document dj.
T[i, j] = tf(ti, dj) ⇤ log
n
dfi
where tf(ti, dj) is the term frequency of the term ti in the do
Relevancy Prediction:
Distributional Semantics
Term frequency
Inverse document frequency
model
term frequency of term t in document d
tot documents
tot documents containing the term t

Hierarchical Clustering
Unweighted Pair Group Method
with Arithmetic mean (UPGMA)
Weighted Pair Group Method
with Arithmetic mean (WPGMA)
Farthest Point or VoorHees (VH)
Reflection
of Semantic
Distribution
Reflection of
Structural
Subdivision
Reflection of
Centrality

29
Predicting Sensor Relevancy for ADLs Logging 131
with the sensors manually annotated as part of such activity logging. These annotations
and readings are taken from the public14
dataset MITes [Tapia et al., 2004] and were
collected during live experiment settings. We pre-processed such dataset (i.e., CSV ﬁles
of sensor readings and metadata about both sensors and activities) to form HTTP PUT
requests to the LD4S API for annotating and storing the data, as in Listing 5.7. Based
on such comparison, the overall accuracy and precision of our system are calculated when
applying either of the clustering algorithms UPGMA, WPGMA or VH.
⇤
1 PUT ld4s:device/2_99
2
3 payload: {’observed_property ’: ’switch ’,
4 ’location -name ’: [’Kitchen ’],
5 ’foi’: [’Fridge ’]}
6
7 headers: {’Content -type ’: ’application/json’,
8 ’Accept ’: ’application/x-turtle ’}
⇥
Listing 5.1: HTTP PUT request forwarded to the LD4S RESTful API.
DataHub (see Section 5.5) was then queried for all the sensor datasets available15
thus returning a JSON list of details of these datasets such as their ID, title, tags, license
and endpoint URIs. The system ﬁlters only those datasets that either have no license or
Sensor Relevancy for ADLs Logging 133
Table 5.1.: Activities labelled in the MITes dataset.
Number of Examples per Class
Activity Subject 1 Subject 2
Preparing dinner 8 14
Preparing lunch 17 20
Listening to music - 18
Taking medication - 14
Toileting 85 40
Preparing breakfast 14 18
Washing dishes 7 21
Preparing a snack 14 16
Watching TV - 15
Bathing 18 -
Going out to work 12 -
Dressing 24 -
Grooming 37 -
Preparing a beverage 15 -
Doing laundry 19 -
Cleaning 8 -
hings expansion. The growth of time cost is analysed more thoroughfully in
.
t semantic similarity value calculated was 1.0 for the pair ¡switch, tv¿ and
mper ¿, followed by 0.00036 for the pair ¡ switch, jewelry box¿ and ¡ switch,
Evaluation Data
27 FoIs —> 351 Similarity Pairs
We considered the worse case in which only one of the sensors sharing the
at the same time range has recently sensed a change in status for the cu
activity, while all the other nearby ones which will likely do so in the near
be predicted. In this case, given n sensors, the amount of pairs to check
relatedness is the binomial coe cient as in Equation 5.10. In our case sinc
di↵erent features of interest, there are 27 di↵erent types of sensors and 351
✓
n
2
◆
=
n!
2!(n 2!)
Even though the binomial coe cient grows quickly, it only depends o
of features of interest rather than on the amount of actually deployed se
same time, the amount of ICOs is expected to grow but the amount
sensors is not, since there is only so much in the real world that can be
sensors. Our method then is not expected to hinder the system from scal
Worst case scenario:
only one of the sensors sharing the same location at the
same time range has recently sensed a change in status for
the current ongoing activity

n the MITes annotations (i.e., actual class). Consequently, we considered a
ssification problem, i.e., whether the sensors actually part of the same activity
clustered in the same cluster. As a result a separate confusion matrix (Table 5.2)
ed for each of the annotated activity.
.: Confusion matrix displaying number of true positives, true negatives, false positives
and false negatives for a 2-class classification problem.
Predicted vs Actual Actual class
1 2
Predicted class
1 TP11 FP12
2 FN21 TN22
uch settings, we calculated precision and overall accuracy.
Precision =
TP11
TP11 + FP12
(5.11)
Accuracy =
TP11 + TN21
TP11 + TN22 + FP12 + FN12
(5.12)
in the same cluster. As a result a separate confusion matrix (Table 5.2)
ch of the annotated activity.
sion matrix displaying number of true positives, true negatives, false positives
lse negatives for a 2-class classification problem.
1 2
Predicted class
1 TP11 FP12
2 FN21 TN22
ings, we calculated precision and overall accuracy.
Precision =
TP11
TP11 + FP12
(5.11)
Accuracy =
TP11 + TN21
TP11 + TN22 + FP12 + FN12
(5.12)
Evaluation: Precision
Dressing Cleaning Toileting Laundry Dinner WashingUp Snack Lunch
Precision of the Activity Clustering
Performance%
0
20
40
60
80
100
WPGMA
UPGMA
VH

n the MITes annotations (i.e., actual class). Consequently, we considered a
ssification problem, i.e., whether the sensors actually part of the same activity
clustered in the same cluster. As a result a separate confusion matrix (Table 5.2)
ed for each of the annotated activity.
.: Confusion matrix displaying number of true positives, true negatives, false positives
and false negatives for a 2-class classification problem.
1 2
Predicted class
1 TP11 FP12
2 FN21 TN22
uch settings, we calculated precision and overall accuracy.
Precision =
TP11
TP11 + FP12
(5.11)
Accuracy =
TP11 + TN21
TP11 + TN22 + FP12 + FN12
(5.12)
in the same cluster. As a result a separate confusion matrix (Table 5.2)
ch of the annotated activity.
sion matrix displaying number of true positives, true negatives, false positives
lse negatives for a 2-class classification problem.
1 2
Predicted class
1 TP11 FP12
2 FN21 TN22
ings, we calculated precision and overall accuracy.
Precision =
TP11
TP11 + FP12
(5.11)
Accuracy =
TP11 + TN21
TP11 + TN22 + FP12 + FN12
(5.12)
Accuracy of the Activity Clustering
Accuracy%
0
20
40
60
80
WPGMA
UPGMA
VH
Evaluation: Accuracy

Evaluation: Comparison with SoTA
0
Figure 5.6.: Comparison between accuracy percentages achieved by the clustering algorithms
for some of the activities.
Table 5.3.: Comparison between the experiment setup and results for our own approach and
the previous closest research e↵orts.
Kwon et al. Wyatt et al. Ours
# Sensors 3 100 200
# Activities 5 26 16
Collection Time 50 mins 360 mins 2 weeks
Goal AR AI RSP
Algorithms HIER HMM UH
Precision 79% 70% 89%
Accuracy - 52% 69%
Our results are relevant as we can notice that our system improved the accuracy by 32%
nd the precision by 5% with respect to such previous e↵orts from the state of the art.
Increase of 32% accuracy
and 5% precision

Evaluation: Performance
50 100 150 200
20406080
Features of Interest (FoIs)
Time(msec)
●
●
●
Time Complexity Growth
Time Growth per Amount of FoIs
●
●
●
#FoIs
27
54
81
112
135
162
189
216
HTTP PUT requests: 3ms
Overall Execution: 18ms
Dataset Discovery on DataHub: 3ms
(20 datasets)
LD4S SPARQL response: 246ms
ESA: 14ms (351 similarity pairs)
Easy-ESA response: 9ms
Highest time cost = 1 min 26 sec for comparing 216 FoIs
Possibility of updating sensors similarities at run-time
CoRE devices (RAM 4 kB and ROM 128 kB): pre-compute offline clustering

Outline
6. Research Answers
35
Core Research
and
Results
Conclusion
Q3. How to identify which sensors
are more relevant sources of
information to define a specific small
scope of interest?
Q4. How can contextualised sensors
improve the quality of traditional
Web content?

36
Enriched Web Content
Between Web and Real Place

38
Short-lived
Data
Long-lived
Data
Cost
Between Web and Real Place

39
1. from DataHub:
Algorithm
Sensors
sharing
Location &
Time
2. Extract Google Search results representing Real Places
3. Live Data Fetching
4. Result Dictionary Update
Bridging the gap between Web and Real Places
G-Sensing

40
G-Sensing Frontend

41
Evaluation Deployment
DataHub
Clinic
Clinic
Clinic
30
sensors
1Km
LD4S
PUT <JSON sensor metadata>
G-Sensing
Google Places
3.692 locations
1.455 (39.4%) have a
website
Query: Acupuncture Galway Salthill

42
Evaluation Coverage
How much of the area defined by the
virtual locations overlaps with the city of
Galway
within radius r=150m
Coverage percentage as we
vary the vicinity radius
Added value of our approach for integrating live data
into physical locations' websites:

! We divided the areas of
Galway divided into squares
with different side lengths l
! We counted the number of
virtual locations within each
square.
Evaluation Distribution
! # virtual locations per square and
their respective frequency shows a
power-law relationship
! while most squares only contain a
small set of locations, a few
squares contain a very large
number of locations
! (e.g., city centres, business parks).

44
Evaluation Performance
! Google search result page: ~145 KB
! After enabling G-Sensing: ~175 KB (~20% increase)
! At browser start-up: query to DataHub for data source discovery: 3 ms
○ 20 sensor datasets discovered
○ 3 sensor datasets have an open license + expose a SPARQL endpoint
○ 1 sensor dataset’s SPARQL endpoint was accessible (LD4S): 246 ms
G-Sensing does not impede on a user's browsing experience
Bandwidth Overhead
Response Time

Outline
6. Research Answers
45
Core Research
and
Results
Conclusion
Q4. How can contextualised sensors
improve the quality of traditional
Web content?
Q5. How can contextualised
sensors improve the adaptability of
mobile constrained and
heterogeneous sensor networks?

Network Adaptability: Demo
46
LD4S
Fuzzy Logic
6LoWPAN + CoAP
Automated LD4S Annotation of new Sensors entering a network

Outline
6. Research Answers
47
Core Research
and
Results
Conclusion
Q5. How can contextualised
sensors improve the adaptability of
mobile constrained and
heterogeneous sensor networks?

Future Work
• Filtering
• of links according to the resource rating/review LD4S system
• of sensor data injected into Google Search results according to user’s prefs &
context
• Extending
• derive labels of activities beyond the per-activity sensor clustering
• sensor data injected into any Web page and content
• sensor data sources extended to include, e.g., TripAdvisor and other user-
generated content
• collect user’s feedback on auto-derived annotations for incremental learning
• Evaluation
• Long-term large-scale user study to gather insights into how users really use
the current functionalities offered by LD4S
• Other areas of research
• Sensor-triggered data can feed back to Linguistic Linked Data knowledge
49

Myriam phd

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Viewers also liked

Viewers also liked (6)

Similar to Myriam phd

Similar to Myriam phd (20)

Recently uploaded

Recently uploaded (20)

Myriam phd