Evaluating Data Quality using Sensor Metadata and Provenance

•

0 likes•272 views

This document discusses evaluating the quality of sensor data using metadata and provenance information. It motivates this by noting the need to assess accuracy on the open web. The authors explore using linked sensor data standards and data provenance to provide context for quality evaluations. Their work to date involves sensors publishing linked data and using reasoners and annotations to evaluate quality. An example scenario shows location data that suggests a bus is near a route that it is actually 500m away from. The document concludes that quality is subjective and future work will investigate using policies and previous quality assessments to improve evaluations.

Technology

Evaluating Quality using Sensor
Metadata and Provenance
(or why Hannah Foreman is not on the bus!)

Chris Baillie, Pete Edwards and Edoardo Pignotti
c.baillie@abdn.ac.uk
http://inf.abdn.ac.uk/~cbaillie

Overview

 Motivation

 What is quality and how do we assess it?

 Work to date

 Example scenario

 Future work

c.baillie@abdn.ac.uk
http://inf.abdn.ac.uk/~cbaillie

Motivation

 “we don’t know whether the information we find [on the
Web] is accurate or not. So we have to teach people
how to assess what they’ve found’’
Vint Cerf, 2010

 Web of Documents has become the Web of documents,
people, services and data.

 Anyone can publish anything so we need a way to
evaluate quality.

c.baillie@abdn.ac.uk
http://inf.abdn.ac.uk/~cbaillie

Sensor data and quality

 Large increase in
publication of sensor data
 Even sensors get it wrong
sometimes!

 Quality is a multidimensional construct
(Wang and Strong 1996, Bizer and Cygniak 2009)
 Bizer (2007) evaluates quality by examining data content,
context and external ratings.
c.baillie@abdn.ac.uk
http://inf.abdn.ac.uk/~cbaillie

Evaluating data quality
 Can Linked Sensor Data provide context?
 W3C SSN Incubator Group (Neuhaus and Compton 2009)
 Can data provenance provide context?
(Hartig and Zhao 2009)
owl:Time
opm:
Agent
ssn:
ssn: opm:artifact
Observed
FeatureOf
Property
Interest
o m aC n o d y
p :w s o tr lle B
s n a S m lin Tim
s :h s a p g e opm:artifact
opm:artifact
opm:Process

om sd
p :u e s n a r O te e t s n b e v d r p r
s :fe tu e fIn r s s :o s r e P o e ty
o m a G n r te B
p :w s e e a d y
ssn: ssn:
ssn:Sensing ssn: Observation Observation
Device Observation Result s n a V lu
s :h s a e Value
s n b ev d y
s :o s r e B s n b e v tio R s lt
s :o s r a n e u
opm:artifact opm:artifact opm:artifact opm:artifact

c.baillie@abdn.ac.uk
http://inf.abdn.ac.uk/~cbaillie

Work to date
qual:
Metric
qual: qual:
Annotation opm:Artifact Indicator
 Sensors publishing opm:Artifact opm:Artifact

Linked Data om sd
p :u e

om Gn
p :w e B om sd
p :u e

 Reasoner evaluates om bu
p :a o t
qual:
om bu
p :a o t

quality q a e c ib s
u l:d s r e Assessment

opm:Process
qual:
Dimension
om sd
p :u e
 Observations annotated opm:Artifact

with assessment ssn:
Observation
outcomes Result

opm:artifact

c.baillie@abdn.ac.uk
http://inf.abdn.ac.uk/~cbaillie

Example scenario
Data describing
Observation Datadata
 New location
“route 17” available
Feature: “Route 17” is
suggests bus
via SPARQL end-
500m away from
point
route!
Recorded: 13/9/11, 11:27

 Location
Source: GPS Receiver
observations
published by iPhone
app
Accuracy: ±15m

Relevance: Poor!

c.baillie@abdn.ac.uk
http://inf.abdn.ac.uk/~cbaillie

Summary

 We are investigating whether linked sensor data and
provenance can provide the context required by quality
assessment.

 Quality is highly subjective: can policies be used to guide the
assessment process?

 Can quality assessments use previous quality outcomes to
enhance performance?

c.baillie@abdn.ac.uk
http://inf.abdn.ac.uk/~cbaillie

Chris Baillie

c.baillie@abdn.ac.uk

http://inf.abdn.ac.uk/~cbaillie

Twitter: @c_baillie

Do you want to learn how to model and simulate an electrical network from scratch in under an hour? Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)! During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook. PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides: - A fully editable and extendable library for grid component modelling; - Visualization tools to display your network; - Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses; The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well. What you will learn during the webinar: - For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills; - For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.

Mission to Decommission: Importance of Decommissioning Products to Increase E...

Product School

Neuro-symbolic is not enough, we need neuro-*semantic*

Frank van Harmelen

Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”. All of this illustrated with link prediction over knowledge graphs, but the argument is general.

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

FIDO Alliance

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

Bits & Pixels using AI for Good.........

Alison B. Lowndes

Connector Corner: Automate dynamic content and events by pushing a button

DianaGray10

Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to: Create a campaign using Mailchimp with merge tags/fields Send an interactive Slack channel message (using buttons) Have the message received by managers and peers along with a test email for review But there’s more: In a second workflow supporting the same use case, you’ll see: Your campaign sent to target colleagues for approval If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team But—if the “Reject” button is pushed, colleagues will be alerted via Slack message Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors. And... Speakers: Akshay Agnihotri, Product Manager Charlie Greenberg, Host

Viewers also liked

A Role for Provenance in Quality AssessmentChris Baillie

Filtros y oscilador de wien

Fernando Marcos Marcos

10.mon prwdwasile

Connect and combine

doeniadee

Quality Reasoning in the Semantic WebChris Baillie

11.mon div

wdwasile

Grammar bookslidesharer09

Grammarbook slidesharer09

Circuitos Digitales - Contador ascendente y descendente con reset

Fernando Marcos Marcos

Viewers also liked (9)

A Role for Provenance in Quality Assessment

Filtros y oscilador de wien

10.mon pr

Connect and combine

Quality Reasoning in the Semantic Web

11.mon div

Grammar book

Grammarbook

Circuitos Digitales - Contador ascendente y descendente con reset

Recently uploaded

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

Albert Hoitingh

Knowledge engineering: from people to machines and back

Elena Simperl

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

91mobiles

PCI PIN Basics Webinar from the Controlcase Team

ControlCase

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

Product School

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

DanBrown980551

Mission to Decommission: Importance of Decommissioning Products to Increase E...

Product School

Neuro-symbolic is not enough, we need neuro-*semantic*

Frank van Harmelen

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

FIDO Alliance

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Bits & Pixels using AI for Good.........

Alison B. Lowndes

Connector Corner: Automate dynamic content and events by pushing a button

DianaGray10

Assuring Contact Center Experiences for Your Customers With ThousandEyes

ThousandEyes

UiPath Test Automation using UiPath Test Suite series, part 4

DianaGray10

Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap. The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies. Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques What will you get from this session? 1. Insights into SAP testing best practices 2. Heatmap utilization for testing 3. Optimization of testing processes 4. Demo Topics covered: Execution from the test manager Orchestrator execution result Defect reporting SAP heatmap example with demo Speaker: Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

FIDO Alliance

Epistemic Interaction - tuning interfaces to provide information for AI support

Alan Dix

Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024 https://alandix.com/academic/papers/synergy2024-epistemic/ As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.

UiPath Test Automation using UiPath Test Suite series, part 3

DianaGray10

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

BookNet Canada

The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more. Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/ Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.

Securing your Kubernetes cluster_ a step-by-step guide to success !

KatiaHIMEUR1

Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster. However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks. In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.

Recently uploaded (20)

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

Knowledge engineering: from people to machines and back

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

PCI PIN Basics Webinar from the Controlcase Team

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

Mission to Decommission: Importance of Decommissioning Products to Increase E...

Neuro-symbolic is not enough, we need neuro-*semantic*

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

GraphRAG is All You need? LLM & Knowledge Graph

Bits & Pixels using AI for Good.........

Connector Corner: Automate dynamic content and events by pushing a button

Assuring Contact Center Experiences for Your Customers With ThousandEyes

UiPath Test Automation using UiPath Test Suite series, part 4

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

Epistemic Interaction - tuning interfaces to provide information for AI support

UiPath Test Automation using UiPath Test Suite series, part 3

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

Securing your Kubernetes cluster_ a step-by-step guide to success !

Evaluating Data Quality using Sensor Metadata and Provenance

1. Evaluating Quality using Sensor Metadata and Provenance (or why Hannah Foreman is not on the bus!) Chris Baillie, Pete Edwards and Edoardo Pignotti c.baillie@abdn.ac.uk http://inf.abdn.ac.uk/~cbaillie

2. Overview  Motivation  What is quality and how do we assess it?  Work to date  Example scenario  Future work c.baillie@abdn.ac.uk http://inf.abdn.ac.uk/~cbaillie

3. Motivation  “we don’t know whether the information we find [on the Web] is accurate or not. So we have to teach people how to assess what they’ve found’’ Vint Cerf, 2010  Web of Documents has become the Web of documents, people, services and data.  Anyone can publish anything so we need a way to evaluate quality. c.baillie@abdn.ac.uk http://inf.abdn.ac.uk/~cbaillie

4. Sensor data and quality  Large increase in publication of sensor data  Even sensors get it wrong sometimes!  Quality is a multidimensional construct (Wang and Strong 1996, Bizer and Cygniak 2009)  Bizer (2007) evaluates quality by examining data content, context and external ratings. c.baillie@abdn.ac.uk http://inf.abdn.ac.uk/~cbaillie

5. Evaluating data quality  Can Linked Sensor Data provide context?  W3C SSN Incubator Group (Neuhaus and Compton 2009)  Can data provenance provide context? (Hartig and Zhao 2009) owl:Time opm: Agent ssn: ssn: opm:artifact Observed FeatureOf Property Interest o m aC n o d y p :w s o tr lle B s n a S m lin Tim s :h s a p g e opm:artifact opm:artifact opm:Process om sd p :u e s n a r O te e t s n b e v d r p r s :fe tu e fIn r s s :o s r e P o e ty o m a G n r te B p :w s e e a d y ssn: ssn: ssn:Sensing ssn: Observation Observation Device Observation Result s n a V lu s :h s a e Value s n b ev d y s :o s r e B s n b e v tio R s lt s :o s r a n e u opm:artifact opm:artifact opm:artifact opm:artifact c.baillie@abdn.ac.uk http://inf.abdn.ac.uk/~cbaillie

6. Work to date qual: Metric qual: qual: Annotation opm:Artifact Indicator  Sensors publishing opm:Artifact opm:Artifact Linked Data om sd p :u e om Gn p :w e B om sd p :u e  Reasoner evaluates om bu p :a o t qual: om bu p :a o t quality q a e c ib s u l:d s r e Assessment opm:Process qual: Dimension om sd p :u e  Observations annotated opm:Artifact with assessment ssn: Observation outcomes Result opm:artifact c.baillie@abdn.ac.uk http://inf.abdn.ac.uk/~cbaillie

7. Example scenario Data describing Observation Datadata  New location “route 17” available Feature: “Route 17” is suggests bus via SPARQL end- 500m away from point route! Recorded: 13/9/11, 11:27  Location Source: GPS Receiver observations published by iPhone app Accuracy: ±15m Relevance: Poor! c.baillie@abdn.ac.uk http://inf.abdn.ac.uk/~cbaillie

8. Summary  We are investigating whether linked sensor data and provenance can provide the context required by quality assessment.  Quality is highly subjective: can policies be used to guide the assessment process?  Can quality assessments use previous quality outcomes to enhance performance? c.baillie@abdn.ac.uk http://inf.abdn.ac.uk/~cbaillie

9. Chris Baillie c.baillie@abdn.ac.uk http://inf.abdn.ac.uk/~cbaillie Twitter: @c_baillie

Editor's Notes

In this talk I will outline: why the need for quality assessment exists describe how quality is perceived outline our approach to quality assessment provide an example scenario and outline our future work.
Vint Cerf, one of the fathers of the Internet, stated recently that “we don’t whether the information we find [on the Web] is accurate or not. So we have to teach people how to assess what they’ve found. This problem is exacerbated by the fact the Web has evolved from what was a collection of static HTML documents to a vast ecosystem of services, data and even people (through social networks). However, the web has always been inherently open and therefore people can publish any content they wish. As a result there exists enormous variation in the quality of information. However, the Web is big and so we decided that we needed to find a smaller field in which to evaluate our approach.
Recently, there has been large increase in the publication of sensor data. We’re also seeing multiple sensors embedded in every-day objects like vehicles, mobile phones and even clothing. For example, we have determined that an individual iPhone 4 has at least 7 sensors within it (light, GPS, sound, accelerometer, gyroscope, camera, temperature). Crucially for us, however, sensors sometimes get it wrong and therefore we have somewhere to test our approach to quality assessment. For example, the sensor output in this image is reporting a temperature of 119 deg F; for those of you reaching for your calculators it’s around 48 deg C. Now, it is difficult to argue with the observation on its own but taking the snow on the ground into consideration the observation’s quality begins to look uncertain. Conversely, if I then tell you that this sensor is monitoring a sauna within the building, the observation starts to look a bit more believable. From this example it is clear to see that context is extremely important in evaluating data quality. We began tackling this problem by investigating how quality is perceived and quickly learned that rather than a discrete value; it is a multidimensional construct consisting of multiple quality dimensions. Different authors give dimensions different definitions, some examples are timeliness, which considers the age of a product, relevance which considers how applicable the data is to the task at hand, and believability which considers how true and credible the data is. Bizer’s WIQA framework is an example of a platform which performs quality assessment on data on the Web by examining the content, its context and external ratings of the data (similar to eBay’s member rating system).
Sensor observations alone have very little context and so we need a way to describe the situation in which a particular observation was created. A number of sensor platforms are now being programmed to publish their observations as Linked Data, which has created the Web of Linked Sensor Data. In 2009 the W3C chartered an Incubator Group to capture the capabilities of sensors and sensor networks. After a survey of existing sensor ontologies, they have created their own sensor describing ontology that can be used to annotate sensor observations with metadata describing the observation’s context. the feature of interest (i.e. Sports Direct Arena!) or the observed property (i.e. AirTemperature). This first diagram provides an example of a sensor observation annotated using the SSN ontology; the observation is observed by a SensingDevice and has properties describing the time it was created, its FeatureOfInterest (e.g. the Sports Direct Arena(!)) and the ObservedProperty (e.g. air temperature). This context could be further enhanced by capturing observation provenance – a record detailing the entities and processes associated with the observation. Due to the diverse number of sensor platforms and data available we require a generic model provenance and have therefore selected the Open Provenance Model as it is completely technology agnostic and can represent Agents such as sensor owners, Artifacts such as sensor observations and the processes involved in producing observations. This second diagram illustrates the capture of sensor data provenance whereby an observation is generated by a Sensing Process which used a Sensing Device and was controlled by an Agent. By examining records such as this we argue that we can consider how sensors perform over time and even use trust or reputation models to evaluate the impact individual agents could have on an observation.
We currently have a number of sensors publishing Linked Sensor Data in line with the W3C SSN Ontology. These are stored on a Web server and are accessible via our visualisation Web service. This service can display observations from a specific sensor within a given time window on a 2D plot. Clicking on individual observation triggers a quality assessment on the selected observation. This assessment is performed by a rule-based reasoner that operates over sensor metadata to produce a number of quality annotations described in our quality ontology. This diagram provides an example of such an assessment process which uses a number of quality indicators, or values that are indicative of data quality (e.g. the age of a the data). Quality metrics describe how quality indicators impact on quality dimensions and are therefore representative of the rules used by the reasoning mechanism. Finally, the assessment process outputs a Quality Annotation which describes a certain quality dimension. At present, these are qualitative, e.g. “Not timely” or “low accuracy” but we will consider quantitative values in the future. Once assessment is complete, observations are annotated with the quality annotations which could facilitate data re-use in the future.
This example stems from a demonstration of the Informed Rural Passenger project to members of the Digital Economy programme team. This project uses sensors within iPhones on public transport to provide real time information on bus locations (via GPS) . Users use an iPhone app to tell us when they are on a bus and the app starts monitoring their location. This map displays part of bus route 17 through Aberdeen, data that is available from one of the project ’ s SPARQL end-points. The system monitors buses along route 17 and plots their location on the map. In this instance, the red markers indicate previous bus locations along the route. At this point in the demonstration, an iPhone with the app loaded was passed to Hannah Foreman who proceeded to press the “ I ’ m on the bus ” button. The map then began to display the new observation but rather than displaying the bus further along route 17, it displayed it… … here Some 500 metres from the bus route! We can begin to examine the metadata associated with the observation to try to work why this has happened. The observation describes a bus along route 17 and it had been recorded only a few seconds prior to being displayed, so there is no problem with the timeliness dimension. Because the observation is some distance from the route, the issue may lie with the accuracy dimension. One problem that can be present using mobile phones for this task is that they may use the Cell triangulation in favour of the built-in GPS – resulting in a significant loss of accuracy. However, in this instance the observation used the GPS received. Moreover, the accuracy associated with this observation, pm15m, is relatively good for a GPS device and so accuracy is good. From this, we can conclude that the iPhone is, in fact, nowhere near route 17 and so the issue lies with the relevance dimension: the observation is just not relevant to the wider context (route 17). So this takes us full circle to the alternate title of this talk and explains why Hannah Foreman is not on the bus! This is an example of how people may provide erroneous data but there are others: GPS readings could be inaccurate, or a time difference between sensor data being transmitted and received. As we’ve seen, this can give rise to low quality data and therefore we need some method to discern between data that is usable and data that is not.
The work we still plan to do begins with looking how provenance of sensor data can be used to evaluate quality. In particular, we wish to investigate whether there are certain quality dimensions that require provenance information for evaluation. We will also investigate whether the results of past quality assessments can be re-used rather than performing quality assessments every time someone wishes to use data by examining the provenance of existing quality annotations. Finally, we believe quality assessment to be highly subjective; everyone will have their own perception of quality data. As such, we intend to investigate the use of policies to guide the assessment process. This could be as simple as placing a single constraint on a particular sensor characteristic or as complex as re-defining how each quality dimension is evaluated.

Evaluating Data Quality using Sensor Metadata and Provenance

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (9)

Recently uploaded

Recently uploaded (20)

Evaluating Data Quality using Sensor Metadata and Provenance

Editor's Notes