Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Interoperability for
Provenance-aware Databases
using PROV and JSON
Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy
Ora...
Outline
① Introduction
② Related work
③ Overview
④ Export and Import
⑤ Experimental Results
⑥ Conclusions and Future Work
Introduction
• The PROV standards
 A standardized, extensible representation of provenance
graphs
 Exchange of provenanc...
Introduction
• Example: extracting demographic information
from tweets
4
Introduction
• Problem:
No relational database system supports tracking of
database provenance as well as import and expo...
Introduction
• GProM System
6
Computes provenance for database
operations
• Queries, updates, transactions
Using SQL lan...
Introduction
• Example of GProM in action
The result of PROVENANCE OF for query Q
Each tuple in this result represents o...
Introduction
• Goal: make databases interoperable with other
provenance systems
• Approach:
Export and import of provenan...
Outline
① Introduction
② Related work
③ Overview
④ Export and Import
⑤ Experimental Results
⑥ Conclusion and future work
Related Work
• How to integrate provenance graphs by identifying common
elements? [6]
• Address interoperability problem b...
Outline
① Introduction
② Related works
③ Overview
④ Export and Import
⑤ Experimental Results
⑥ Conclusion and future work
Overview
• We introduce techniques for exporting database provenance
as PROV documents
• Importing PROV graphs alongside d...
Outline
① Introduction
② Related works
③ Overview
④ Export and Import
⑤ Experimental Results
⑥ Conclusion and future work
Export and Import
• Export
– Added TRANSLATE AS clause
• e.g., PROVENANCE OF (SELECT ...) TRANSLATE
AS …
– Construct PROV-...
Export and Import
• Example: part of the final PROV document
15
Red dotted lines in DB
Export and Import
• Import
Import PROV for an existing relation
Provide a language construct IMPORT PROV FOR ...
Import...
Export and Import
• Import:example
Relation user with imported provenance
Attribute value d is the PROV graph from runni...
Export and Import
• Using Imported Provenance During Export
Include the imported provenance as bundles in the
generated P...
Export and Import
• Example of Bundles:
19
Export and Import
• Handling Updates
If a tuple is modified, that should be reflected when
provenance is exported
• E.g.,...
Export and Import
• Challenge
 How to track the provenance of updates under
transactional semantics
• Solution
GProM usi...
Outline
① Introduction
② Related works
③ Overview
④ Export and Import
⑤ Experimental Results
⑥ Conclusion and future work
Experimental Results
• TPC-H [14] benchmark datasets
 Scale factor from 0.01 to 10 (10MB up to 10GB size)
• Run on a mach...
Experimental Results
24
1 GB
10 GB
Outline
① Introduction
② Related works
③ Overview
④ Export and Import
⑤ Experimental Results
⑥ Conclusions and Future Work
Conclusions and Future Work
Conclusions
• Integrated import and export of provenance represented as
PROV-JSON into/from pr...
Questions
• My Webpage
– http://www.cs.iit.edu/~dbgroup/people/xniu.php
• Our Group’s Webpage
– http://cs.iit.edu/~dbgroup...
Others
• Provenance querying
• Provenance for JSON
28
Upcoming SlideShare
Loading in …5
×

2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON

639 views

Published on

Since its inception, the PROV standard has been widely adopted as a standardized exchange format for provenance information. Surprisingly, this standard is currently not supported by provenance- aware database systems limiting their interoperability with other provenance-aware systems. In this work we introduce techniques for exporting database provenance as PROV documents, importing PROV graphs alongside data, and linking outputs of an SQL operation to the imported provenance for its inputs. Our implementation in the GProM system offloads generation of PROV documents to the backend database. This implementation enables provenance tracking for applications that use a relational database for managing (part of) their data, but also execute some non-database operations.

Published in: Science
  • Be the first to comment

  • Be the first to like this

2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON

  1. 1. Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor, Boris Glavic Illinois Institute of Technology Venkatesh Radhakrishnan Facebook Xing Niu Illinois Institute of Technology xniu7@hawk.iit.edu
  2. 2. Outline ① Introduction ② Related work ③ Overview ④ Export and Import ⑤ Experimental Results ⑥ Conclusions and Future Work
  3. 3. Introduction • The PROV standards  A standardized, extensible representation of provenance graphs  Exchange of provenance information between systems • Provenance-aware DBMS  Computing the provenance of database operations  E.g., Perm[1], GProM [2], DBNotes[3], Orchestra[4], LogicBlox[5] 3 [1] B. Glavic, R. J. Miller, and G. Alonso. Using SQL for Efficient Generation and Querying of Provenance Information. In In Search of Elegance in the Theory and Practice of Computation, pages 291–320. Springer, 2013.. [2] YB. Arab, D. Gawlick, V. Radhakrishnan, H. Guo, and B. Glavic. A generic provenance middleware for database queries, updates, and transactions. In TaPP, 2014. [3] D. Bhagwat, L. Chiticariu, W.-C. Tan, and G. Vijayvargiya. An Annotation Management System for Relational Databases. VLDB Journal, 14(4):373–396, 2005. [4] G. Karvounarakis, T. J. Green, Z. G. Ives, and V. Tannen. Collaborative data sharing via update exchange and provenance. TODS, 38(3):19, 2013. [5] Huang, S., Green, T., Loo, B.: Datalog and emerging applications: an interactive tutorial. In: SIGMOD, pp. 1213–1216 (2011)
  4. 4. Introduction • Example: extracting demographic information from tweets 4
  5. 5. Introduction • Problem: No relational database system supports tracking of database provenance as well as import and export of provenance in PROV Not capable of exporting provenance into standardized formats • E.g., GProM: Essentially produces wasDerivedFrom edges • Between the output tuples of a query Q and its inputs. However, not available as PROV graphs • No way to track the derivation back to non-database entities 5
  6. 6. Introduction • GProM System 6 Computes provenance for database operations • Queries, updates, transactions Using SQL language extensions • e.g., PROVENANCE OF (SELECT ...)
  7. 7. Introduction • Example of GProM in action The result of PROVENANCE OF for query Q Each tuple in this result represents one wasDerivedFrom assertion • E.g., tuple to1 was derived from tuple t1 7
  8. 8. Introduction • Goal: make databases interoperable with other provenance systems • Approach: Export and import of provenance • PROV-JSON Propagation of imported provenance Implemented in GProM using SQL 8
  9. 9. Outline ① Introduction ② Related work ③ Overview ④ Export and Import ⑤ Experimental Results ⑥ Conclusion and future work
  10. 10. Related Work • How to integrate provenance graphs by identifying common elements? [6] • Address interoperability problem between databases and other provenance-aware systems through – Common model for both types of provenance [7][8][9] – Monitoring database access to link database provenance with other provenance systems [10][11] 10 [6] A. Gehani and D. Tariq. Provenance integration. In TaPP, 2014. [7] U. Acar, P. Buneman, J. Cheney, J. van den Bussche, N. Kwasnikowska, and S. Vansummeren. A graph model of data and workflow provenance. In TaPP, 2010. [8] Y. Amsterdamer, S. Davidson, D. Deutch, T. Milo, J. Stoyanovich, and V. Tannen. Putting Lipstick on Pig: Enabling Database-style Workflow Provenance. PVLDB, 5(4):346–357, 2011. [9] D. Deutch, Y. Moskovitch, and V. Tannen. A provenance framework for data-dependent process analysis. PVLDB, 7(6), 2014. [10] F. Chirigati and J. Freire. Towards integrating workflow and database provenance. In IPAW, pages 11–23, 2012. [11] Q. Pham, T. Malik, B. Glavic, and I. Foster. LDV: Light-weight Database Virtualization. In ICDE, pages 1179–1190, 2015.
  11. 11. Outline ① Introduction ② Related works ③ Overview ④ Export and Import ⑤ Experimental Results ⑥ Conclusion and future work
  12. 12. Overview • We introduce techniques for exporting database provenance as PROV documents • Importing PROV graphs alongside data • Linking outputs of SQL operations to imported provenance for their inputs – Implementation in GProM offloads generation of PROV documents to backend database • SQL and string concatenation 12
  13. 13. Outline ① Introduction ② Related works ③ Overview ④ Export and Import ⑤ Experimental Results ⑥ Conclusion and future work
  14. 14. Export and Import • Export – Added TRANSLATE AS clause • e.g., PROVENANCE OF (SELECT ...) TRANSLATE AS … – Construct PROV-JSON document from database provenance ① Running several projections over the provenance computation – E.g., ‘”_:wgb(’ || F0.STATE || ‘|’ || F0.”AVG(AGE)” || ‘)’… ② Uses aggregation to concatenate all snippets of a certain type – E.g., entity nodes, wasGeneratedBy edges, allUsed edges ③ Uses string concatenation to create final document 14
  15. 15. Export and Import • Example: part of the final PROV document 15 Red dotted lines in DB
  16. 16. Export and Import • Import Import PROV for an existing relation Provide a language construct IMPORT PROV FOR ... Import available PROV graphs for imported tuples and store them alongside the data Add three columns to each table to store imported provenance • prov doc: store a PROV-JSON snippet representing its provenance • Prov_eid: indicates which of the entities in this snippet represents the imported tuple • Prov_time: stores a timestamp as of the time when the tuple was imported 16
  17. 17. Export and Import • Import:example Relation user with imported provenance Attribute value d is the PROV graph from running example without database activities and entities 17
  18. 18. Export and Import • Using Imported Provenance During Export Include the imported provenance as bundles in the generated PROV graph • Bundles [13] enable nesting of PROV graphs within PROV graphs, treating a nested graph as a new entity. Connect the entities representing input tuples in the imported provenance to the query activity and output tuple entities 18 [13] P. Missier, K. Belhajjame, and J. Cheney. The W3C PROV family of specifications for modelling provenance metadata. In EDBT, pages 773–776, 2013.
  19. 19. Export and Import • Example of Bundles: 19
  20. 20. Export and Import • Handling Updates If a tuple is modified, that should be reflected when provenance is exported • E.g., by running an SQL UPDATE statement • Example  Assume the user has run an update to correct tuple t1’s age value (setting age to 70) before running the query 20
  21. 21. Export and Import • Challenge  How to track the provenance of updates under transactional semantics • Solution GProM using the novel concept of reenactment queries • User can request the provenance of an past update, transaction, or set of updates executed within a given time interval • Construct PROV document using provenance for updates computed on-the-fly 21
  22. 22. Outline ① Introduction ② Related works ③ Overview ④ Export and Import ⑤ Experimental Results ⑥ Conclusion and future work
  23. 23. Experimental Results • TPC-H [14] benchmark datasets  Scale factor from 0.01 to 10 (10MB up to 10GB size) • Run on a machine with  2 x AMD Opteron 3.3Ghz Processors  128GB RAM  4 x 1 TB 7.2K RPM disks configured in RAID 5 • Queries  Provenance of a three way join between relations customer, order, and nation  With additional selection conditions to control selectivity (and, thus, the size of the exported PROV-JSON document). 23 [14] TPC. TPC-H Benchmark Specification, 2009.
  24. 24. Experimental Results 24 1 GB 10 GB
  25. 25. Outline ① Introduction ② Related works ③ Overview ④ Export and Import ⑤ Experimental Results ⑥ Conclusions and Future Work
  26. 26. Conclusions and Future Work Conclusions • Integrated import and export of provenance represented as PROV-JSON into/from provenance-aware databases • Construct PROV graphs on-the-fly using SQL • Connect database provenance to imported PROV data Future Work • Full implementation for updates • Automatic storage management (e.g., deduplication) for imported provenance • Automatic cross-referencing 26
  27. 27. Questions • My Webpage – http://www.cs.iit.edu/~dbgroup/people/xniu.php • Our Group’s Webpage – http://cs.iit.edu/~dbgroup/research/index.html • GProM – http://www.cs.iit.edu/~dbgroup/research/gprom.ph p 27
  28. 28. Others • Provenance querying • Provenance for JSON 28

×