Prov4J: A Semantic Web Framework for Generic Provenance Management  André Freitas, Arnaud Legendre, Sean O’Riain, Edward C...
Outline <ul><li>Motivation. </li></ul><ul><li>Generic provenance management on the Web. </li></ul><ul><li>Prov4J: </li></u...
Motivation: Data on the Web <ul><li>Accelerated by the adoption and uptake of Linked Data. </li></ul><ul><li>Paradigm shif...
Motivation <ul><li>Provenance as a cornerstone element for quality assessment. </li></ul><ul><li>Expansion of the applicat...
Generic Provenance Management <ul><li>Provenance management for this larger audience. </li></ul><ul><li>Covers the set of ...
Generic Provenance Management Provenance for the Masses
Research Questions <ul><li>Are Semantic Web standards and tools appropriate for capturing, representing and consuming prov...
Research Goals <ul><li>Answer these questions. </li></ul><ul><li>Provide a Generic Provenance Management Framework for the...
Main Components <ul><li>Provenance Representation </li></ul><ul><li>Provenance Consumption </li></ul><ul><li>Provenance Ca...
W3P <ul><li>Lightweight provenance ontology for the Web. </li></ul><ul><li>Focused on provenance for data quality assessme...
W3P: Classes & Properties  (excerpt) Core Workflow Model
Building Prov4J <ul><li>Core requirements for a generic provenance management framework. </li></ul><ul><ul><li>Capture </l...
Core Requirements <ul><li>Provenance capture: </li></ul><ul><ul><li>Minimum number of software adaptations </li></ul></ul>...
Core Requirements (cont’d) <ul><li>Common requirements: </li></ul><ul><ul><li>User data representation independency </li><...
High-Level Architecture
Consumption: Components
Consumption: Query Types <ul><li>Query Types </li></ul><ul><ul><li>SPARQL based queries </li></ul></ul><ul><ul><li>Queries...
Capture: Software Engineering Principles <ul><li>Aspect Oriented Programming & Annotations. </li></ul><ul><li>Pushback cap...
Capture: Adaptations
Capture: Logging & Storage
Scenario
Core Requirements Coverage
Summary <ul><li>Semantic Web standards and tools played a fundamental role in the construction of the framework. </li></ul...
Future Work <ul><li>Evaluation of query expressivity and performance. </li></ul><ul><li>W3C Prov-XG requirements coverage ...
<ul><li>http://prov4j.org </li></ul>
Cost of  Reasoning
Provenance Queries:  Evaluation <ul><li>Dataset: 51 queries (existing <= 12) </li></ul><ul><ul><li>Ex: List all workflows ...
Upcoming SlideShare
Loading in …5
×

Prov4J: A Semantic Web Framework for Generic Provenance Management

910
-1

Published on

Prov4J: A Semantic Web Framework for Generic Provenance Management


André Freitas, Arnaud Legendre, Sean O’Riain, Edward Curry

paper: http://andrefreitas.org/papers/Prov4J%20A%20Semantic%20Web%20Framework%20for%20Generic%20Provenance%20Management.pdf

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
910
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Prov4J: A Semantic Web Framework for Generic Provenance Management

  1. 1. Prov4J: A Semantic Web Framework for Generic Provenance Management André Freitas, Arnaud Legendre, Sean O’Riain, Edward Curry
  2. 2. Outline <ul><li>Motivation. </li></ul><ul><li>Generic provenance management on the Web. </li></ul><ul><li>Prov4J: </li></ul><ul><ul><li>Capture </li></ul></ul><ul><ul><li>Representation </li></ul></ul><ul><ul><li>Consumption </li></ul></ul><ul><ul><li>Deployment </li></ul></ul>
  3. 3. Motivation: Data on the Web <ul><li>Accelerated by the adoption and uptake of Linked Data. </li></ul><ul><li>Paradigm shift: </li></ul><ul><ul><li>Change in the way information is consumed on the Web. </li></ul></ul><ul><li>Main Issue: </li></ul><ul><ul><li>Quality Assessment & Trustworthiness. </li></ul></ul>
  4. 4. Motivation <ul><li>Provenance as a cornerstone element for quality assessment. </li></ul><ul><li>Expansion of the application of provenance into different domains and types of systems </li></ul><ul><li>Generic applications generating or consuming data on the Web need to become provenance-aware . </li></ul><ul><li>Provenance-aware : Ability to capture , represent and consume provenance information associated with the data. </li></ul>
  5. 5. Generic Provenance Management <ul><li>Provenance management for this larger audience. </li></ul><ul><li>Covers the set of the most frequent requirements for provenance capture and consumption on the Web. </li></ul>
  6. 6. Generic Provenance Management Provenance for the Masses
  7. 7. Research Questions <ul><li>Are Semantic Web standards and tools appropriate for capturing, representing and consuming provenance on the Web? </li></ul><ul><li>What are the key software engineering aspects which need to be employed to reduce the barriers for the construction of provenance-aware applications? </li></ul>
  8. 8. Research Goals <ul><li>Answer these questions. </li></ul><ul><li>Provide a Generic Provenance Management Framework for the Web. </li></ul><ul><li>Make it available for experimentation by the community. </li></ul>
  9. 9. Main Components <ul><li>Provenance Representation </li></ul><ul><li>Provenance Consumption </li></ul><ul><li>Provenance Capture </li></ul>W3P Prov4J
  10. 10. W3P <ul><li>Lightweight provenance ontology for the Web. </li></ul><ul><li>Focused on provenance for data quality assessment. </li></ul><ul><li>Designed to be compatible with the Open Provenance Model. </li></ul><ul><li>Dimensions: Workflow, Publishing and Social Provenance. </li></ul><ul><li>Building W3P: </li></ul><ul><ul><li>Use cases; </li></ul></ul><ul><ul><li>Data quality dimensions; </li></ul></ul><ul><ul><li>Literature review; </li></ul></ul><ul><ul><li>Requirements; </li></ul></ul><ul><ul><li>Core provenance concepts; </li></ul></ul><ul><ul><li>Use and refinement; </li></ul></ul>
  11. 11. W3P: Classes & Properties (excerpt) Core Workflow Model
  12. 12. Building Prov4J <ul><li>Core requirements for a generic provenance management framework. </li></ul><ul><ul><li>Capture </li></ul></ul><ul><ul><li>Consumption </li></ul></ul><ul><li>Provenance architecture. </li></ul><ul><li>Core software engineering aspects for capturing provenance. </li></ul><ul><li>Deployment in a real world scenario. </li></ul><ul><li>Core requirements coverage analysis. </li></ul>
  13. 13. Core Requirements <ul><li>Provenance capture: </li></ul><ul><ul><li>Minimum number of software adaptations </li></ul></ul><ul><ul><li>Low impact on performance </li></ul></ul><ul><ul><li>Expressive interface </li></ul></ul><ul><ul><li>Scalability </li></ul></ul><ul><ul><li>Structured provenance data </li></ul></ul><ul><ul><li>Publication of provenance data </li></ul></ul><ul><li>Provenance consumption: </li></ul><ul><ul><li>Query expressivity </li></ul></ul><ul><ul><li>Query performance & scalability </li></ul></ul><ul><ul><li>Provenance discovery </li></ul></ul><ul><ul><li>Mapping from different provenance models </li></ul></ul><ul><ul><li>Usability </li></ul></ul>
  14. 14. Core Requirements (cont’d) <ul><li>Common requirements: </li></ul><ul><ul><li>User data representation independency </li></ul></ul><ul><ul><li>Separation of concerns </li></ul></ul><ul><ul><li>Reliable provenance storage </li></ul></ul><ul><ul><li>Basic system administration support </li></ul></ul><ul><ul><li>Security </li></ul></ul>
  15. 15. High-Level Architecture
  16. 16. Consumption: Components
  17. 17. Consumption: Query Types <ul><li>Query Types </li></ul><ul><ul><li>SPARQL based queries </li></ul></ul><ul><ul><li>Queries supported by reasoning </li></ul></ul><ul><ul><li>Path queries </li></ul></ul><ul><ul><li>Navigational queries </li></ul></ul><ul><ul><li>Similarity queries </li></ul></ul><ul><li>Query Type Distribution (API) </li></ul><ul><ul><li>33% used transitivity </li></ul></ul><ul><ul><li>9% used rules reasoning </li></ul></ul><ul><ul><li>9% used path features </li></ul></ul><ul><ul><li>20% used SPARQL extensions </li></ul></ul><ul><ul><li>30% pure SPARQL </li></ul></ul><ul><ul><li>4% similarity </li></ul></ul>
  18. 18. Capture: Software Engineering Principles <ul><li>Aspect Oriented Programming & Annotations. </li></ul><ul><li>Pushback capture. </li></ul><ul><li>Minimization of Adaptations. </li></ul><ul><ul><li>Context-based provenance construction. </li></ul></ul><ul><li>Provenance URIs. </li></ul>
  19. 19. Capture: Adaptations
  20. 20. Capture: Logging & Storage
  21. 21. Scenario
  22. 22. Core Requirements Coverage
  23. 23. Summary <ul><li>Semantic Web standards and tools played a fundamental role in the construction of the framework. </li></ul><ul><li>Query expressivity over original SPARQL was improved. </li></ul><ul><li>Transitivity, path queries proved to be very important features. </li></ul><ul><li>Framework is usable in a realistic scenario. </li></ul><ul><li>High coverage of core requirements. </li></ul><ul><li>Available for download from early November/2010. </li></ul>
  24. 24. Future Work <ul><li>Evaluation of query expressivity and performance. </li></ul><ul><li>W3C Prov-XG requirements coverage analysis. </li></ul><ul><li>Improvement of the coverage of the core requirements. </li></ul>
  25. 25. <ul><li>http://prov4j.org </li></ul>
  26. 26. Cost of Reasoning
  27. 27. Provenance Queries: Evaluation <ul><li>Dataset: 51 queries (existing <= 12) </li></ul><ul><ul><li>Ex: List all workflows with a given pattern (X-> A -> Y). </li></ul></ul><ul><ul><li>Ex: Which processes preceeded artifact X? </li></ul></ul><ul><li>Categories: </li></ul><ul><ul><li>Workflow, Path, Publication, Social, Similarity, Counterfactual. </li></ul></ul><ul><li>Results: </li></ul><ul><ul><li>Completeness: </li></ul></ul><ul><ul><ul><li>82% of the queries were addressed. </li></ul></ul></ul><ul><ul><ul><li>95% were completely addressed. </li></ul></ul></ul><ul><ul><li>Performance: </li></ul></ul><ul><ul><ul><li>Max: 200s (Similarity). </li></ul></ul></ul><ul><ul><ul><li>Min: 1 ms. </li></ul></ul></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×