SlideShare a Scribd company logo
1 of 15
Download to read offline
Formal Verification of Data Provenance Records
Szymon Klarman1, Stefan Schlobach1 and Luciano Serafini2
1 Knowledge Representation & Reasoning Group
VU University Amsterdam
2 Data & Knowledge Management Group
FBK Trento
November 14, 2012
The 11th International Semantic Web Conference
(ISWC-12)
ISWC-12 Formal Verification of Data Provenance Records
Problem: reasoning over data provenance
Sahoo, S., Sheth, A., Henson, C. Semantic Provenance for eScience: Managing the Deluge of
Scientific Data. In IEEE Internet Computing 12(4), 2008.
S. Klarman, S. Schlobach, L. Serafini 1 / 13
ISWC-12 Formal Verification of Data Provenance Records
Problem: reasoning over data provenance
Sahoo, S., Sheth, A., Henson, C. Semantic Provenance for eScience: Managing the Deluge of
Scientific Data. In IEEE Internet Computing 12(4), 2008.
S. Klarman, S. Schlobach, L. Serafini 1 / 13
List the protein groups identified with high confidence value – that is,
protein groups with a Mascot score > 3500 – detected by the Mascot
search engine against a T.cruzi database (Mascot search input
parameter, Taxonomy = T.cruzi). The protein groups should contain
at least one peptide fragment with a specific consensus sequence of
{*N [P] [S/T]*}.
ISWC-12 Formal Verification of Data Provenance Records
Overview
Problem:
How to formally verify data provenance records? This involves:
• adequately representing provenance records,
• defining a language for expressing relevant properties,
• ensuring that reasoning is “manageable”.
Approach:
• Provenance records resemble transition systems, which are typically
verified using various dynamic logics.
• We develop Provenance Specification Logic for verifying and querying
data provenance records, based on Propositional Dynamic Logic and
standard query languages.
S. Klarman, S. Schlobach, L. Serafini 2 / 13
ISWC-12 Formal Verification of Data Provenance Records
Data provenance records
A data provenance record is the history of derivation of a data artifact
from its sources.
Data artifact =
dataset / knowledge base = a set of axioms (DL/OWL/RDF(S))
bb cc
DD BB
CC
aar
dds
Note: Particular representation languages come with dedicated query
languages, e.g., conjunctive queries for DLs/OWL, datalog for OWL RL,
SPARQL for RDF(S).
S. Klarman, S. Schlobach, L. Serafini 3 / 13
ISWC-12 Formal Verification of Data Provenance Records
Data provenance records
used used
D2
bbaa r
HH MM
bb cc
MM LL
KK
aa
LLMM
ccaa s
r
PP
wasGeneratedBy
D1
D3
L. Moreau, et al. The open provenance model – core specification. In Future Generation
Computer Systems 27, 2010.
S. Klarman, S. Schlobach, L. Serafini 4 / 13
Provenance graphs:
• process nodes: P
• data artifact nodes: D1, D2, D3
(each corresponding to a data
artifact)
• edges labeled with relation names,
e.g.: wasGeneratedBy, used.
• directed, acyclic, finite.
ISWC-12 Formal Verification of Data Provenance Records
Verification as model-checking
Provenance graphs are very similar to finite-state transition systems.
p, qp, q
pp qq
qq
R
S
S
R
S
used used
D2
bbaa r
HH MM
bb cc
MM LL
KK
aa
LLMM
ccaa s
r
PP
wasGeneratedBy
D1
D3
• natural to analyze using the framework of modal logics, in particular
Propositional Dynamic Logic,
• basic reasoning task is model-checking,
• we need to replace propositions with richer formulas — queries —
and effectively work with two-dimensional languages.
S. Klarman, S. Schlobach, L. Serafini 5 / 13
ISWC-12 Formal Verification of Data Provenance Records
Provenance specification logic
Object formulas: q ::= queries from a given class Q
Path expressions: π ::= r | π; π | π ∪ π | π− | π∗ | v? | α?
Provenance formulas: α ::= {q} | π α | α ∧ α | ¬α |
The semantics is a combination of the semantics of PDL and Q-queries:
• a sequence of instances a is an answer to α iff G, v |= α[a]
• for a query q(x) in α, and node v, q(x) is satisfied in v for a iff
D(v) |= q[a|x ]
Model-checking problem: given a provenance graph G, node v,
provenance formula α, and a sequence a, decide wether G, v |= α[a].
S. Klarman, S. Schlobach, L. Serafini 6 / 13
ISWC-12 Formal Verification of Data Provenance Records
The First Provenance Challenge
• a workflow for creating “atlases” of high resolution anatomical data
• 9 queries about the resulting provenance records
anatomy-header-1anatomy-header-1
anatomy-image-1anatomy-image-1
align-warp-1align-warp-1
reference-header-1reference-header-1
reference-image-1reference-image-1
anatomy-header-2anatomy-header-2
anatomy-image-2anatomy-image-2
align-warp-2align-warp-2
warp-parameters-1warp-parameters-1 warp-parameters-2warp-parameters-2
... ...
softmean-1softmean-1
atlas-header-1atlas-header-1 atlas-image-1atlas-image-1
...
atlas-x-graphic-1atlas-x-graphic-1
used
wasGeneratedBy
processprocess
data artifactdata artifact
L. Moreau, et al. Special issue: The First Provenance Challenge. In Concurrency and
Computation: Practice and Experience 20, 2008.
S. Klarman, S. Schlobach, L. Serafini 7 / 13
ISWC-12 Formal Verification of Data Provenance Records
Example
Q: Find all output averaged images of softmean (average) procedures, where the
warped images taken as input were align warp’ed using a twelfth order nonlinear
1365 parameter model, i.e. where softmean was preceded in the workflow,
directly or indirectly, by an align warp procedure with argument -m 12.
α := {Image(x)}∧ wasGeneretedBy; softmean1...n; used ({∃y.Image(y)}∧
(wasGeneratedBy; used)∗; wasGeneratedBy; align-warp1...m )
where: softmean1...n := softmean-1? ∪ . . . ∪ softmean-n?
align-warp1...m := align-warp-1? ∪ . . . ∪ align-warp-m?
softmean-isoftmean-iImage(x)Image(x) ∃ y. Image(y)∃ y. Image(y) ... ...
align-warp-jalign-warp-j
S. Klarman, S. Schlobach, L. Serafini 8 / 13
ISWC-12 Formal Verification of Data Provenance Records
Accommodating rich provenance metadata
Represent the provenance graph as a separate (meta-)knowledge base.
dataartifactsprovenancemeta-KB
D1
D1
D2
D2
D3
D3
ArtifactArtifactProcessProcess PP
used
wasGeneratedBy
used
bbaa r
HH MM
bb cc
MM LL
KK
aa
LLMM
ccaa s
r
used used
D2
bbaa r
HH MM
bb cc
MM LL
KK
aa
LLMM
ccaa s
r
PP
wasGeneratedBy
D1
D3
Add new test operator C?, for a concept C of the provenance language.
G, v |= C? iff meta-KB |= C(v)
S. Klarman, S. Schlobach, L. Serafini 9 / 13
ISWC-12 Formal Verification of Data Provenance Records
Example cntd.
Q: [...] was preceded in the workflow, directly or indirectly, by an align warp
procedure with argument -m 12.
Provenance meta-KB:
• Align-warp Process
• Align-warp ∃argument.String
• Align-warp(align-warpi ), for every 1 ≤ i ≤ m,
• argument(align-warpk, “-m 12”), for every align-warpk with argument
“-m 12”.
α := ... (wasGeneratedBy; used)∗; wasGeneratedBy; align-warp1...m
where: align-warp1...m := align-warp-1? ∪ . . . ∪ align-warp-m?
replace: align-warp1...m
with: Align-warp ∃argument.“-m 12”?
S. Klarman, S. Schlobach, L. Serafini 10 / 13
ISWC-12 Formal Verification of Data Provenance Records
Observations
• we assume this collection is representative of the problem of reasoning
with data provenance,
• the tasks consist of a logical verification component and a search
component,
• the logical verification component can be captured by PSL, often by
breaking down complex tasks into a number of model-checking
problems,
• the queries are essentially two-dimensional,
• some patterns could be usefully compiled out as a syntactic sugar.
S. Klarman, S. Schlobach, L. Serafini 11 / 13
ISWC-12 Formal Verification of Data Provenance Records
Reasoning
Reasoning in PSL is PTimeSW
-complete, where:
• PTime is the complexity of model-checking in PDL,
• ·SW is an oracle performing reasoning with the Semantic Web
representation/query languages used, of the respective complexity.
D1
D1
D2
D2
D3
D3
ArtifactArtifactProcessProcess
PP
used
wasGeneratedBy
used
D1
D1
D2
D2
D3
D3
ArtifactArtifactProcessProcess PP
used
wasGeneratedBy
used
bbaa r
HH MM
bb cc
MM LL
KK
aa
LLMM
ccaa s
r
bb
aa
r
HH MM
bb cc
MM LL
KK
aar LLMM
ccaa s
D3,p,...D3,p,...
P,r,...P,r,...
D2,q,...D2,q,...D1,p,...D1,p,...
used used
wasGeneratedBy
Reasoning with data
Model-checking
transition systems
Reasoning with
data provenance records
S. Klarman, S. Schlobach, L. Serafini 12 / 13
ISWC-12 Formal Verification of Data Provenance Records
Summary
Our problem involved:
• adequately representing provenance records
⇒ provenance graphs, i.e. transition systems with rich data states.
The approach is agnostic as to the choice of particular data and
provenance languages,
• defining a language for expressing relevant properties
⇒ PSL = dynamic logic + query formulas as atoms,
• ensuring that reasoning is “manageable”
⇒ PTimeSW
-completeness is good!
Conclusion:
A generic, declarative approach to reasoning with data provenance records.
Outlook:
Broader validation, implementation, study of most useful setups.
S. Klarman, S. Schlobach, L. Serafini 13 / 13

More Related Content

What's hot

Deriving human readable labels from sparql queries
Deriving human readable labels from sparql queries Deriving human readable labels from sparql queries
Deriving human readable labels from sparql queries Basil Ell
 
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge GraphJoint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge GraphFedorNikolaev
 
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...Goran S. Milovanovic
 
R programming groundup-basic-section-i
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-iDr. Awase Khirni Syed
 
Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...
Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...
Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...Lukas Galke
 
R Programming Language
R Programming LanguageR Programming Language
R Programming LanguageNareshKarela1
 
Introduction to R for Data Science :: Session 6 [Linear Regression in R]
Introduction to R for Data Science :: Session 6 [Linear Regression in R] Introduction to R for Data Science :: Session 6 [Linear Regression in R]
Introduction to R for Data Science :: Session 6 [Linear Regression in R] Goran S. Milovanovic
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on RAjay Ohri
 

What's hot (10)

Deriving human readable labels from sparql queries
Deriving human readable labels from sparql queries Deriving human readable labels from sparql queries
Deriving human readable labels from sparql queries
 
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge GraphJoint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
 
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
 
R programming groundup-basic-section-i
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-i
 
Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...
Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...
Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...
 
R Programming Language
R Programming LanguageR Programming Language
R Programming Language
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
Ibmr 2014
Ibmr 2014Ibmr 2014
Ibmr 2014
 
Introduction to R for Data Science :: Session 6 [Linear Regression in R]
Introduction to R for Data Science :: Session 6 [Linear Regression in R] Introduction to R for Data Science :: Session 6 [Linear Regression in R]
Introduction to R for Data Science :: Session 6 [Linear Regression in R]
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
 

Similar to Formal Verification of Data Provenance Records

semlavssws2015
semlavssws2015semlavssws2015
semlavssws2015hala Skaf
 
Data integration and provenance-Chapter-14
Data integration and provenance-Chapter-14Data integration and provenance-Chapter-14
Data integration and provenance-Chapter-14saadhash286
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...Paolo Missier
 
Rethinking Data-Intensive Science Using Scalable Analytics Systems
Rethinking Data-Intensive Science Using Scalable Analytics Systems Rethinking Data-Intensive Science Using Scalable Analytics Systems
Rethinking Data-Intensive Science Using Scalable Analytics Systems fnothaft
 
Framester: A Wide Coverage Linguistic Linked Data Hub
Framester: A Wide Coverage Linguistic Linked Data HubFramester: A Wide Coverage Linguistic Linked Data Hub
Framester: A Wide Coverage Linguistic Linked Data HubMehwish Alam
 
Fcv learn yu
Fcv learn yuFcv learn yu
Fcv learn yuzukun
 
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paolo Missier
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in Rherbps10
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the CloudDataMine Lab
 
Self-composition by Symbolic Execution
Self-composition by Symbolic ExecutionSelf-composition by Symbolic Execution
Self-composition by Symbolic ExecutionQuoc-Sang Phan
 
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServerText Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServerVitor Hirota Makiyama
 
Query Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data SourcesQuery Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data SourcesJie Bao
 
Evaluating the Effectiveness of Axiomatic Approaches in Web Track
Evaluating the Effectiveness of Axiomatic Approaches in Web TrackEvaluating the Effectiveness of Axiomatic Approaches in Web Track
Evaluating the Effectiveness of Axiomatic Approaches in Web TrackTwitter Inc.
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in SparkDatabricks
 
Why-Not Provenance through Game Semantics
Why-Not Provenance through Game SemanticsWhy-Not Provenance through Game Semantics
Why-Not Provenance through Game SemanticsBertram Ludäscher
 
Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanfordSakthivel C R
 
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light SystemTMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light SystemIosif Itkin
 

Similar to Formal Verification of Data Provenance Records (20)

semlavssws2015
semlavssws2015semlavssws2015
semlavssws2015
 
Data integration and provenance-Chapter-14
Data integration and provenance-Chapter-14Data integration and provenance-Chapter-14
Data integration and provenance-Chapter-14
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...
 
Rethinking Data-Intensive Science Using Scalable Analytics Systems
Rethinking Data-Intensive Science Using Scalable Analytics Systems Rethinking Data-Intensive Science Using Scalable Analytics Systems
Rethinking Data-Intensive Science Using Scalable Analytics Systems
 
Framester: A Wide Coverage Linguistic Linked Data Hub
Framester: A Wide Coverage Linguistic Linked Data HubFramester: A Wide Coverage Linguistic Linked Data Hub
Framester: A Wide Coverage Linguistic Linked Data Hub
 
Fcv learn yu
Fcv learn yuFcv learn yu
Fcv learn yu
 
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in R
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the Cloud
 
Self-composition by Symbolic Execution
Self-composition by Symbolic ExecutionSelf-composition by Symbolic Execution
Self-composition by Symbolic Execution
 
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServerText Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
 
Query Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data SourcesQuery Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data Sources
 
Evaluating the Effectiveness of Axiomatic Approaches in Web Track
Evaluating the Effectiveness of Axiomatic Approaches in Web TrackEvaluating the Effectiveness of Axiomatic Approaches in Web Track
Evaluating the Effectiveness of Axiomatic Approaches in Web Track
 
C034011016
C034011016C034011016
C034011016
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
 
Works 2015-provenance-mileage
Works 2015-provenance-mileageWorks 2015-provenance-mileage
Works 2015-provenance-mileage
 
Why-Not Provenance through Game Semantics
Why-Not Provenance through Game SemanticsWhy-Not Provenance through Game Semantics
Why-Not Provenance through Game Semantics
 
Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanford
 
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light SystemTMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
 

More from Szymon Klarman

Data driven approaches to empirical discovery
Data driven approaches to empirical discoveryData driven approaches to empirical discovery
Data driven approaches to empirical discoverySzymon Klarman
 
ABox Abduction in the Description Logic
ABox Abduction in the Description LogicABox Abduction in the Description Logic
ABox Abduction in the Description LogicSzymon Klarman
 
Judgment Aggregation as Maximization of Epistemic and Social Utility
Judgment Aggregation as Maximization of Epistemic and Social UtilityJudgment Aggregation as Maximization of Epistemic and Social Utility
Judgment Aggregation as Maximization of Epistemic and Social UtilitySzymon Klarman
 
Description Logics of Context
Description Logics of ContextDescription Logics of Context
Description Logics of ContextSzymon Klarman
 
Prediction and Explanation over DL-Lite Data Streams
Prediction and Explanation over DL-Lite Data StreamsPrediction and Explanation over DL-Lite Data Streams
Prediction and Explanation over DL-Lite Data StreamsSzymon Klarman
 
Querying Temporal Databases via OWL 2 QL
Querying Temporal Databases via OWL 2 QLQuerying Temporal Databases via OWL 2 QL
Querying Temporal Databases via OWL 2 QLSzymon Klarman
 
Ontology learning from interpretations in lightweight description logics
Ontology learning from interpretations in lightweight description logicsOntology learning from interpretations in lightweight description logics
Ontology learning from interpretations in lightweight description logicsSzymon Klarman
 
Knowledge Assembly at Scale with Semantic and Probabilistic Techniques
Knowledge Assembly at Scale with Semantic and Probabilistic TechniquesKnowledge Assembly at Scale with Semantic and Probabilistic Techniques
Knowledge Assembly at Scale with Semantic and Probabilistic TechniquesSzymon Klarman
 
What makes a linked data pattern interesting?
What makes a linked data pattern interesting?What makes a linked data pattern interesting?
What makes a linked data pattern interesting?Szymon Klarman
 
SKOS: Building taxonomies with minimum ontological commitment
SKOS: Building taxonomies  with minimum ontological commitmentSKOS: Building taxonomies  with minimum ontological commitment
SKOS: Building taxonomies with minimum ontological commitmentSzymon Klarman
 

More from Szymon Klarman (11)

HyperGraphQL
HyperGraphQLHyperGraphQL
HyperGraphQL
 
Data driven approaches to empirical discovery
Data driven approaches to empirical discoveryData driven approaches to empirical discovery
Data driven approaches to empirical discovery
 
ABox Abduction in the Description Logic
ABox Abduction in the Description LogicABox Abduction in the Description Logic
ABox Abduction in the Description Logic
 
Judgment Aggregation as Maximization of Epistemic and Social Utility
Judgment Aggregation as Maximization of Epistemic and Social UtilityJudgment Aggregation as Maximization of Epistemic and Social Utility
Judgment Aggregation as Maximization of Epistemic and Social Utility
 
Description Logics of Context
Description Logics of ContextDescription Logics of Context
Description Logics of Context
 
Prediction and Explanation over DL-Lite Data Streams
Prediction and Explanation over DL-Lite Data StreamsPrediction and Explanation over DL-Lite Data Streams
Prediction and Explanation over DL-Lite Data Streams
 
Querying Temporal Databases via OWL 2 QL
Querying Temporal Databases via OWL 2 QLQuerying Temporal Databases via OWL 2 QL
Querying Temporal Databases via OWL 2 QL
 
Ontology learning from interpretations in lightweight description logics
Ontology learning from interpretations in lightweight description logicsOntology learning from interpretations in lightweight description logics
Ontology learning from interpretations in lightweight description logics
 
Knowledge Assembly at Scale with Semantic and Probabilistic Techniques
Knowledge Assembly at Scale with Semantic and Probabilistic TechniquesKnowledge Assembly at Scale with Semantic and Probabilistic Techniques
Knowledge Assembly at Scale with Semantic and Probabilistic Techniques
 
What makes a linked data pattern interesting?
What makes a linked data pattern interesting?What makes a linked data pattern interesting?
What makes a linked data pattern interesting?
 
SKOS: Building taxonomies with minimum ontological commitment
SKOS: Building taxonomies  with minimum ontological commitmentSKOS: Building taxonomies  with minimum ontological commitment
SKOS: Building taxonomies with minimum ontological commitment
 

Recently uploaded

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 

Recently uploaded (20)

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 

Formal Verification of Data Provenance Records

  • 1. Formal Verification of Data Provenance Records Szymon Klarman1, Stefan Schlobach1 and Luciano Serafini2 1 Knowledge Representation & Reasoning Group VU University Amsterdam 2 Data & Knowledge Management Group FBK Trento November 14, 2012 The 11th International Semantic Web Conference (ISWC-12)
  • 2. ISWC-12 Formal Verification of Data Provenance Records Problem: reasoning over data provenance Sahoo, S., Sheth, A., Henson, C. Semantic Provenance for eScience: Managing the Deluge of Scientific Data. In IEEE Internet Computing 12(4), 2008. S. Klarman, S. Schlobach, L. Serafini 1 / 13
  • 3. ISWC-12 Formal Verification of Data Provenance Records Problem: reasoning over data provenance Sahoo, S., Sheth, A., Henson, C. Semantic Provenance for eScience: Managing the Deluge of Scientific Data. In IEEE Internet Computing 12(4), 2008. S. Klarman, S. Schlobach, L. Serafini 1 / 13 List the protein groups identified with high confidence value – that is, protein groups with a Mascot score > 3500 – detected by the Mascot search engine against a T.cruzi database (Mascot search input parameter, Taxonomy = T.cruzi). The protein groups should contain at least one peptide fragment with a specific consensus sequence of {*N [P] [S/T]*}.
  • 4. ISWC-12 Formal Verification of Data Provenance Records Overview Problem: How to formally verify data provenance records? This involves: • adequately representing provenance records, • defining a language for expressing relevant properties, • ensuring that reasoning is “manageable”. Approach: • Provenance records resemble transition systems, which are typically verified using various dynamic logics. • We develop Provenance Specification Logic for verifying and querying data provenance records, based on Propositional Dynamic Logic and standard query languages. S. Klarman, S. Schlobach, L. Serafini 2 / 13
  • 5. ISWC-12 Formal Verification of Data Provenance Records Data provenance records A data provenance record is the history of derivation of a data artifact from its sources. Data artifact = dataset / knowledge base = a set of axioms (DL/OWL/RDF(S)) bb cc DD BB CC aar dds Note: Particular representation languages come with dedicated query languages, e.g., conjunctive queries for DLs/OWL, datalog for OWL RL, SPARQL for RDF(S). S. Klarman, S. Schlobach, L. Serafini 3 / 13
  • 6. ISWC-12 Formal Verification of Data Provenance Records Data provenance records used used D2 bbaa r HH MM bb cc MM LL KK aa LLMM ccaa s r PP wasGeneratedBy D1 D3 L. Moreau, et al. The open provenance model – core specification. In Future Generation Computer Systems 27, 2010. S. Klarman, S. Schlobach, L. Serafini 4 / 13 Provenance graphs: • process nodes: P • data artifact nodes: D1, D2, D3 (each corresponding to a data artifact) • edges labeled with relation names, e.g.: wasGeneratedBy, used. • directed, acyclic, finite.
  • 7. ISWC-12 Formal Verification of Data Provenance Records Verification as model-checking Provenance graphs are very similar to finite-state transition systems. p, qp, q pp qq qq R S S R S used used D2 bbaa r HH MM bb cc MM LL KK aa LLMM ccaa s r PP wasGeneratedBy D1 D3 • natural to analyze using the framework of modal logics, in particular Propositional Dynamic Logic, • basic reasoning task is model-checking, • we need to replace propositions with richer formulas — queries — and effectively work with two-dimensional languages. S. Klarman, S. Schlobach, L. Serafini 5 / 13
  • 8. ISWC-12 Formal Verification of Data Provenance Records Provenance specification logic Object formulas: q ::= queries from a given class Q Path expressions: π ::= r | π; π | π ∪ π | π− | π∗ | v? | α? Provenance formulas: α ::= {q} | π α | α ∧ α | ¬α | The semantics is a combination of the semantics of PDL and Q-queries: • a sequence of instances a is an answer to α iff G, v |= α[a] • for a query q(x) in α, and node v, q(x) is satisfied in v for a iff D(v) |= q[a|x ] Model-checking problem: given a provenance graph G, node v, provenance formula α, and a sequence a, decide wether G, v |= α[a]. S. Klarman, S. Schlobach, L. Serafini 6 / 13
  • 9. ISWC-12 Formal Verification of Data Provenance Records The First Provenance Challenge • a workflow for creating “atlases” of high resolution anatomical data • 9 queries about the resulting provenance records anatomy-header-1anatomy-header-1 anatomy-image-1anatomy-image-1 align-warp-1align-warp-1 reference-header-1reference-header-1 reference-image-1reference-image-1 anatomy-header-2anatomy-header-2 anatomy-image-2anatomy-image-2 align-warp-2align-warp-2 warp-parameters-1warp-parameters-1 warp-parameters-2warp-parameters-2 ... ... softmean-1softmean-1 atlas-header-1atlas-header-1 atlas-image-1atlas-image-1 ... atlas-x-graphic-1atlas-x-graphic-1 used wasGeneratedBy processprocess data artifactdata artifact L. Moreau, et al. Special issue: The First Provenance Challenge. In Concurrency and Computation: Practice and Experience 20, 2008. S. Klarman, S. Schlobach, L. Serafini 7 / 13
  • 10. ISWC-12 Formal Verification of Data Provenance Records Example Q: Find all output averaged images of softmean (average) procedures, where the warped images taken as input were align warp’ed using a twelfth order nonlinear 1365 parameter model, i.e. where softmean was preceded in the workflow, directly or indirectly, by an align warp procedure with argument -m 12. α := {Image(x)}∧ wasGeneretedBy; softmean1...n; used ({∃y.Image(y)}∧ (wasGeneratedBy; used)∗; wasGeneratedBy; align-warp1...m ) where: softmean1...n := softmean-1? ∪ . . . ∪ softmean-n? align-warp1...m := align-warp-1? ∪ . . . ∪ align-warp-m? softmean-isoftmean-iImage(x)Image(x) ∃ y. Image(y)∃ y. Image(y) ... ... align-warp-jalign-warp-j S. Klarman, S. Schlobach, L. Serafini 8 / 13
  • 11. ISWC-12 Formal Verification of Data Provenance Records Accommodating rich provenance metadata Represent the provenance graph as a separate (meta-)knowledge base. dataartifactsprovenancemeta-KB D1 D1 D2 D2 D3 D3 ArtifactArtifactProcessProcess PP used wasGeneratedBy used bbaa r HH MM bb cc MM LL KK aa LLMM ccaa s r used used D2 bbaa r HH MM bb cc MM LL KK aa LLMM ccaa s r PP wasGeneratedBy D1 D3 Add new test operator C?, for a concept C of the provenance language. G, v |= C? iff meta-KB |= C(v) S. Klarman, S. Schlobach, L. Serafini 9 / 13
  • 12. ISWC-12 Formal Verification of Data Provenance Records Example cntd. Q: [...] was preceded in the workflow, directly or indirectly, by an align warp procedure with argument -m 12. Provenance meta-KB: • Align-warp Process • Align-warp ∃argument.String • Align-warp(align-warpi ), for every 1 ≤ i ≤ m, • argument(align-warpk, “-m 12”), for every align-warpk with argument “-m 12”. α := ... (wasGeneratedBy; used)∗; wasGeneratedBy; align-warp1...m where: align-warp1...m := align-warp-1? ∪ . . . ∪ align-warp-m? replace: align-warp1...m with: Align-warp ∃argument.“-m 12”? S. Klarman, S. Schlobach, L. Serafini 10 / 13
  • 13. ISWC-12 Formal Verification of Data Provenance Records Observations • we assume this collection is representative of the problem of reasoning with data provenance, • the tasks consist of a logical verification component and a search component, • the logical verification component can be captured by PSL, often by breaking down complex tasks into a number of model-checking problems, • the queries are essentially two-dimensional, • some patterns could be usefully compiled out as a syntactic sugar. S. Klarman, S. Schlobach, L. Serafini 11 / 13
  • 14. ISWC-12 Formal Verification of Data Provenance Records Reasoning Reasoning in PSL is PTimeSW -complete, where: • PTime is the complexity of model-checking in PDL, • ·SW is an oracle performing reasoning with the Semantic Web representation/query languages used, of the respective complexity. D1 D1 D2 D2 D3 D3 ArtifactArtifactProcessProcess PP used wasGeneratedBy used D1 D1 D2 D2 D3 D3 ArtifactArtifactProcessProcess PP used wasGeneratedBy used bbaa r HH MM bb cc MM LL KK aa LLMM ccaa s r bb aa r HH MM bb cc MM LL KK aar LLMM ccaa s D3,p,...D3,p,... P,r,...P,r,... D2,q,...D2,q,...D1,p,...D1,p,... used used wasGeneratedBy Reasoning with data Model-checking transition systems Reasoning with data provenance records S. Klarman, S. Schlobach, L. Serafini 12 / 13
  • 15. ISWC-12 Formal Verification of Data Provenance Records Summary Our problem involved: • adequately representing provenance records ⇒ provenance graphs, i.e. transition systems with rich data states. The approach is agnostic as to the choice of particular data and provenance languages, • defining a language for expressing relevant properties ⇒ PSL = dynamic logic + query formulas as atoms, • ensuring that reasoning is “manageable” ⇒ PTimeSW -completeness is good! Conclusion: A generic, declarative approach to reasoning with data provenance records. Outlook: Broader validation, implementation, study of most useful setups. S. Klarman, S. Schlobach, L. Serafini 13 / 13