A Role for Provenance in Quality Assessment

•Download as PPT, PDF•

0 likes•276 views

This document discusses using provenance information to help assess data quality. It proposes representing sensor observations and their provenance as linked data and using this information to evaluate quality metrics like accuracy, timeliness, and relevance. The work done so far involves representing observations and quality requirements as linked data and generating initial quality scores. Future work will focus on implementing quality rules that examine provenance information and enabling quality scores to be reused.

Technology

A Role for Provenance in Quality
Assessment

Chris Baillie, Pete Edwards, and Edoardo Pignotti
c.baillie@abdn.ac.uk

Overview

 Motivation

 Evaluating Data Quality

 A Role for Provenance

 Future work

c.baillie@abdn.ac.uk

Motivation

 “we don’t know whether the information we find [on the Web]
is accurate or not. So we have to teach people how to assess
what they’ve found’’
Vint Cerf, 2010

 Web of Documents has become the Web of documents,
services, data, and people.

 Anyone can publish anything so we need a way to evaluate
quality.

 We are investigating these issues within the Internet of Things
 Sensors now at the centre of many applications

c.baillie@abdn.ac.uk

Evaluating Data Quality
Quality Scores
-Quality is a multi-
Entity (and context) dimensional construct
To evaluate quality, we - Accuracy
must examine the - Timeliness
context around data - Relevance
F(E, R) = Q

WIQA Framework
examines data content, Data Requirements
context, and external -Furber and Hepp (2011)
ratings use rules to identify
(Bizer et al. 2009) quality problems

c.baillie@abdn.ac.uk

Representing Sensor Observations

 Linked Data: “recommended best practice for exposing,
sharing, and connecting pieces of data using URIs and RDF”

c.baillie@abdn.ac.uk

$Performing Quality Assessment CONSTRUCT { _:b0 a QualityScore . _:b0 score ?qs . ( E distanceFromRoute X ) _:b0 dqm:ruleViolation _:b1 . Rrelevance = 1- 100 _:b1 a DataRequirementViolation . _:b1 dqm:affectedInstance ?instance . } WHERE { ?instance a Observation . ?instance distanceFromRoute ?distance . LET (?qs := (1 - (?distance / 100))) . } c.baillie@abdn.ac.uk$

Quality Assessment Results

c.baillie@abdn.ac.uk

Observation Provenance
 Provenance is a critical part of observation context

 Describes the entities, agents, and activities involved in
data creation:
 How was the observation value measured?
 Who controlled the sensing process?
 How has the observation been transformed since it was
created?

 W3C Prov-O model provides linked data representation
of provenance

Observation Provenance
Entity
"Observation 2"

wasGeneratedBy
Activity
"Map matching"

used
Agent
"Chris"
Entity
"Observation 1"
wasAssociatedWith

wasGeneratedBy
Activity
"Sensing Process"

used

Entity
"iPhoneSensor"

Work To Date
 Developed Quality Assessment Framework that enables:
 Linked data representation of sensor observations
 Definition of quality requirements using SPARQL rules
 Generation of quality scores via reasoning

Future Work
 Implementation of quality rules that examine provenance
 Investigate quality score re-use

Any questions?

Come and see the IRP demo (D9) to see quality
assessment in action.

Implementation
Quality Rules
Observation Reasoner Relevance
Triple (SPIN) Rule
Store
Timeliness
Rule
Apache Tomcat Accuracy
Rule
Observation Quality
Service Service Availability
Rule

This paper describes an infrastructure for the automated evaluation of semantic technologies and, in particular, semantic search technologies. For this purpose, we present an evaluation framework which follows a service-oriented approach for evaluating semantic technologies and uses the Business Process Execution Language (BPEL) to define evaluation workflows that can be executed by process engines. This framework supports a variety of evaluations, from different semantic areas, including search, and is extendible to new evaluations. We show how BPEL addresses this diversity as well as how it is used to solve specific challenges such as heterogeneity, error handling and reuse. Presented at Data infrastructurEs for Supporting Information Retrieval Evaluation (DESIRE 2011) Workshop, Co-located with CIKM 2011, the 20th ACM Conference on Information and Knowledge Management Friday 28th October 2011, Glasgow, UK http://www.promise-noe.eu/events/desire-2011/

IoT 2010 Talk on System Infrastructure for the Internet of Things.

Fahim Kawsar

Kliment ppt gi2011_testing_remote_final

IGN Vorstand

Using Web Data Provenance for Quality Assessment

Olaf Hartig

Testing systemqualities agile2012drewz lin

Testing System Qualities Agile2012 by Rebecca Wirfs-Brock and Joseph Yoder

Joseph Yoder

Agile teams incrementally deliver functionality based on user stories. In the sprint to deliver features, frequently software qualities such as security, scalability, performance, and reliability are overlooked. Often these characteristics cut across many user stories. Trying to deal with certain system qualities late in the game can be difficult, causing major refactoring and upheaval of the system’s architecture. This churn isn’t inevitable. Especially if you adopt a practice of identifying those characteristics key to your system’s success, writing quality scenarios and tests, and delivering on these capabilities at the opportune time. We will show how to write Quality Scenarios that emphasize architecture capabilities such as usability, security, performance, scalability, internationalization, availability, accessibility and the like. This will be hands-on; we present some examples and follow with an exercise that illustrates how you can look at a system, identify, and then write and test quality scenarios.

February 2010 8 Things You Cant Afford To Ignore About eDiscovery

John Wang

8 Things You Can't Afford to Ignore About eDiscovery. Unstructured content is growing at an unprecedented rate, reaching 650% over five years, with Fortune 1000 companies managing petabytes of data. With electronically stored information (ESI) being formally covered under the Federal Rules of Civil Procedure (FRCP), organizations need new tools to effectively manage, analyze, and review ESI. This article presents 8 techniques and technologies that can be used to lower costs and improve litigation success.

The International Journal of Engineering and Science (The IJES)theijes

Pr 005 qa_workshop

Frank Gielen

Top100summit christinaChristina Geng

Ca partner day - qualità servizi - roma 2 di 2

CA Technologies Italia

MED301 Is My CDN Performing? - AWS re: Invent 2012

Amazon Web Services

Viewers also liked

Unforgetable trip sp2 hslidesharer09

Evaluating Data Quality using Sensor Metadata and ProvenanceChris Baillie

Grammar bookslidesharer09

Connect and combine

doeniadee

10.mon pr

wdwasile

11.mon div

wdwasile

Quality Reasoning in the Semantic WebChris Baillie

Filtros y oscilador de wien

Fernando Marcos Marcos

Circuitos Digitales - Contador ascendente y descendente con reset

Fernando Marcos Marcos

Viewers also liked (9)

Unforgetable trip sp2 h

Evaluating Data Quality using Sensor Metadata and Provenance

Grammar book

Connect and combine

10.mon pr

11.mon div

Quality Reasoning in the Semantic Web

Filtros y oscilador de wien

Circuitos Digitales - Contador ascendente y descendente con reset

Similar to A Role for Provenance in Quality Assessment

COBWEB A quality assurance workflow authoring tool for citizen science and cr...

COBWEB Project

Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...

Stuart Wrigley

IoT 2010 Talk on System Infrastructure for the Internet of Things.

Fahim Kawsar

Kliment ppt gi2011_testing_remote_final

IGN Vorstand

Using Web Data Provenance for Quality Assessment

Olaf Hartig

Testing systemqualities agile2012drewz lin

Testing System Qualities Agile2012 by Rebecca Wirfs-Brock and Joseph Yoder

Joseph Yoder

February 2010 8 Things You Cant Afford To Ignore About eDiscovery

John Wang

The International Journal of Engineering and Science (The IJES)theijes

Pr 005 qa_workshop

Frank Gielen

Top100summit christinaChristina Geng

Ca partner day - qualità servizi - roma 2 di 2

CA Technologies Italia

MED301 Is My CDN Performing? - AWS re: Invent 2012

Amazon Web Services

Cloud Computing for Developers and Architects - QCon 2008 Tutorial

Stuart Charlton

Knowledge mobilization

Integrated Knowledge Services

Albert Simard - Mobilizing Knowledge: Acquisition, Analysis, and Action

Institute for Knowledge Mobilization

Semantically-Enhanced Recommendation AlgorithmsLuigi Ceccaroni

Hypothesis Based Testing: Power + Speed.

STAG Software Private Limited

Industrialized Linked Data

Dave Reynolds

service quality & usabilityYves Pigneur

Similar to A Role for Provenance in Quality Assessment (20)

COBWEB A quality assurance workflow authoring tool for citizen science and cr...

Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...

IoT 2010 Talk on System Infrastructure for the Internet of Things.

Kliment ppt gi2011_testing_remote_final

Using Web Data Provenance for Quality Assessment

Testing systemqualities agile2012

Testing System Qualities Agile2012 by Rebecca Wirfs-Brock and Joseph Yoder

February 2010 8 Things You Cant Afford To Ignore About eDiscovery

The International Journal of Engineering and Science (The IJES)

Pr 005 qa_workshop

Top100summit christina

Ca partner day - qualità servizi - roma 2 di 2

MED301 Is My CDN Performing? - AWS re: Invent 2012

Cloud Computing for Developers and Architects - QCon 2008 Tutorial

Knowledge mobilization

Albert Simard - Mobilizing Knowledge: Acquisition, Analysis, and Action

Semantically-Enhanced Recommendation Algorithms

Hypothesis Based Testing: Power + Speed.

Industrialized Linked Data

service quality & usability

Recently uploaded

Leading Change strategies and insights for effective change management pdf 1.pdf

OnBoard

How world-class product teams are winning in the AI era by CEO and Founder, P...

Product School

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

Product School

Elevating Tactical DDD Patterns Through Object Calisthenics

Dorra BARTAGUIZ

After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!

Assuring Contact Center Experiences for Your Customers With ThousandEyes

ThousandEyes

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

FIDO Alliance

Designing Great Products: The Power of Design and Leadership by Chief Designe...

Product School

Knowledge engineering: from people to machines and back

Elena Simperl

Epistemic Interaction - tuning interfaces to provide information for AI support

Alan Dix

Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024 https://alandix.com/academic/papers/synergy2024-epistemic/ As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

BookNet Canada

The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more. Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/ Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

FIDO Alliance

The Future of Platform Engineering

Jemma Hussein Allen

PCI PIN Basics Webinar from the Controlcase Team

ControlCase

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

91mobiles

The Art of the Pitch: WordPress Relationships and Sales

Laura Byrne

Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes? All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

Product School

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

Neuro-symbolic is not enough, we need neuro-*semantic*

Frank van Harmelen

Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”. All of this illustrated with link prediction over knowledge graphs, but the argument is general.

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

Albert Hoitingh

Recently uploaded (20)

Leading Change strategies and insights for effective change management pdf 1.pdf

How world-class product teams are winning in the AI era by CEO and Founder, P...

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

Elevating Tactical DDD Patterns Through Object Calisthenics

Assuring Contact Center Experiences for Your Customers With ThousandEyes

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

Designing Great Products: The Power of Design and Leadership by Chief Designe...

Knowledge engineering: from people to machines and back

Epistemic Interaction - tuning interfaces to provide information for AI support

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

The Future of Platform Engineering

PCI PIN Basics Webinar from the Controlcase Team

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

The Art of the Pitch: WordPress Relationships and Sales

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

GraphRAG is All You need? LLM & Knowledge Graph

Neuro-symbolic is not enough, we need neuro-*semantic*

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

A Role for Provenance in Quality Assessment

1. A Role for Provenance in Quality Assessment Chris Baillie, Pete Edwards, and Edoardo Pignotti c.baillie@abdn.ac.uk

2. Overview  Motivation  Evaluating Data Quality  A Role for Provenance  Future work c.baillie@abdn.ac.uk

3. Motivation  “we don’t know whether the information we find [on the Web] is accurate or not. So we have to teach people how to assess what they’ve found’’ Vint Cerf, 2010  Web of Documents has become the Web of documents, services, data, and people.  Anyone can publish anything so we need a way to evaluate quality.  We are investigating these issues within the Internet of Things  Sensors now at the centre of many applications c.baillie@abdn.ac.uk

4. Example Scenario c.baillie@abdn.ac.uk

5. Evaluating Data Quality Quality Scores -Quality is a multi- Entity (and context) dimensional construct To evaluate quality, we - Accuracy must examine the - Timeliness context around data - Relevance F(E, R) = Q WIQA Framework examines data content, Data Requirements context, and external -Furber and Hepp (2011) ratings use rules to identify (Bizer et al. 2009) quality problems c.baillie@abdn.ac.uk

6. Representing Sensor Observations  Linked Data: “recommended best practice for exposing, sharing, and connecting pieces of data using URIs and RDF” c.baillie@abdn.ac.uk

7. Performing Quality Assessment CONSTRUCT { _:b0 a QualityScore . _:b0 score ?qs . ( E distanceFromRoute X ) _:b0 dqm:ruleViolation _:b1 . Rrelevance = 1- 100 _:b1 a DataRequirementViolation . _:b1 dqm:affectedInstance ?instance . } WHERE { ?instance a Observation . ?instance distanceFromRoute ?distance . LET (?qs := (1 - (?distance / 100))) . } c.baillie@abdn.ac.uk

8. Quality Assessment Results c.baillie@abdn.ac.uk

9. Observation Provenance  Provenance is a critical part of observation context  Describes the entities, agents, and activities involved in data creation:  How was the observation value measured?  Who controlled the sensing process?  How has the observation been transformed since it was created?  W3C Prov-O model provides linked data representation of provenance

10. Observation Provenance Entity "Observation 2" wasGeneratedBy Activity "Map matching" used Agent "Chris" Entity "Observation 1" wasAssociatedWith wasGeneratedBy Activity "Sensing Process" used Entity "iPhoneSensor"

11. Quality Score Provenance

12. Work To Date  Developed Quality Assessment Framework that enables:  Linked data representation of sensor observations  Definition of quality requirements using SPARQL rules  Generation of quality scores via reasoning Future Work  Implementation of quality rules that examine provenance  Investigate quality score re-use

13. Any questions? Come and see the IRP demo (D9) to see quality assessment in action.

14. Implementation Quality Rules Observation Reasoner Relevance Triple (SPIN) Rule Store Timeliness Rule Apache Tomcat Accuracy Rule Observation Quality Service Service Availability Rule

Editor's Notes

In this talk I will outline: why the need for quality assessment exists describe how quality is perceived outline our approach to quality assessment provide an example scenario and outline our future work.
Don’t know whether information is accuracte: need to assess! Web has evolved. Web = open platform. Web is big, need smaller platform for eval.
Consider mobile phones providing passenger information regarding the location of buses. Sometimes we get lucky and observations land right on the bus route. However, there are many different sources of low quality data. Inaccurate GPS readings… Malicious users… someone playing with the app while at home People that make mistakes… someone perhaps on the wrong bus…
Animate this ObservationValue ->[Motivate SSN here] Observation + foi -> disruption report
DataRequirement1 -> wasAttributedTo -> Agent

A Role for Provenance in Quality Assessment

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (9)

Similar to A Role for Provenance in Quality Assessment

Similar to A Role for Provenance in Quality Assessment (20)

Recently uploaded

Recently uploaded (20)

A Role for Provenance in Quality Assessment

Editor's Notes