Your SlideShare is downloading. ×
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Report on Provenance Challenge 3, at the PC3 meeting, 2009
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Report on Provenance Challenge 3, at the PC3 meeting, 2009

266

Published on

Amsterdam, June 10-11, 2009

Amsterdam, June 10-11, 2009

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
266
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Provenance challenge 3 and Taverna Paolo Missier Information Management GroupSchool of Computer Science, University of Manchester, UK Provenance Challenge 3 meeting Amsterdam, June 10-11, 2009 1
  • 2. Interpreting the challenge1. Implement the challenge workflow2. Answer the core (+optional) challenge queries3. Produce and export OPM4. Import and consume OPM 2
  • 3. Interpreting the challenge1. Implement the challenge workflow ➡ Expose the same amount of data that is visible in the Trident version of the workflow2. Answer the core (+optional) challenge queries3. Produce and export OPM4. Import and consume OPM 2
  • 4. Interpreting the challenge1. Implement the challenge workflow ➡ Expose the same amount of data that is visible in the Trident version of the workflow2. Answer the core (+optional) challenge queries ➡ Is the current Taverna Provenance (TP) query model sufficient to answer the queries?3. Produce and export OPM4. Import and consume OPM 2
  • 5. Interpreting the challenge1. Implement the challenge workflow ➡ Expose the same amount of data that is visible in the Trident version of the workflow2. Answer the core (+optional) challenge queries ➡ Is the current Taverna Provenance (TP) query model sufficient to answer the queries?3. Produce and export OPM ➡ Export the smallest OPM graph that contains the query answer ➡(graph for the entire run is a special “degenerate” case)4. Import and consume OPM 2
  • 6. Interpreting the challenge1. Implement the challenge workflow ➡ Expose the same amount of data that is visible in the Trident version of the workflow2. Answer the core (+optional) challenge queries ➡ Is the current Taverna Provenance (TP) query model sufficient to answer the queries?3. Produce and export OPM ➡ Export the smallest OPM graph that contains the query answer ➡(graph for the entire run is a special “degenerate” case)4. Import and consume OPM ➡Map an OPM graph to an instance of a TP causal graph ➡Use TP queries to answer the same challenge queries as above 2
  • 7. Our challenges 3
  • 8. Our challenges➡Expose the same amount of data that is visible in the Trident version of the workflow: ➡ map the control flow to a pure dataflow model 3
  • 9. Our challenges➡Expose the same amount of data that is visible in the Trident version of the workflow: ➡ map the control flow to a pure dataflow model➡Is the current Taverna Provenance (TP) query model sufficient to answer the queries? ➡ challenge queries gave us good new requirements 3
  • 10. Our challenges➡Expose the same amount of data that is visible in the Trident version of the workflow: ➡ map the control flow to a pure dataflow model➡Is the current Taverna Provenance (TP) query model sufficient to answer the queries? ➡ challenge queries gave us good new requirements➡export the smallest OPM graph that contains the query answer ➡ OPM generation is now fully integrated into TP query answering 3
  • 11. Our challenges➡Expose the same amount of data that is visible in the Trident version of the workflow: ➡ map the control flow to a pure dataflow model➡Is the current Taverna Provenance (TP) query model sufficient to answer the queries? ➡ challenge queries gave us good new requirements➡export the smallest OPM graph that contains the query answer ➡ OPM generation is now fully integrated into TP query answering➡Map an OPM graph to an instance of a TP causal graph➡Use TP queries to answer the same challenge queries as above ➡ TP query processing requires a representation of the originating workflow ➡ This required generating the “minimal plausible 3 originating dataflow” (MPOD) for the OPM graph
  • 12. The challenge workflow as a Taverna dataflow data links control links: “first LoadCSVFileIntoTable, then UpdateComputedColumns” 4
  • 13. The challenge workflow as a Taverna dataflow this produces a list... data links control links: “first LoadCSVFileIntoTable, then UpdateComputedColumns” 4
  • 14. The challenge workflow as a Taverna dataflow this produces a list... ...these consume one item of the list at a time.... data links control links: “first LoadCSVFileIntoTable, then UpdateComputedColumns” 4
  • 15. The challenge workflow as a Taverna dataflow this produces a list... ...these consume one item of the list at a time.... the resulting iteration over the lists occurs automatically data links control links: “first LoadCSVFileIntoTable, then UpdateComputedColumns” 4
  • 16. TP query model and new requirements - queries are purely structural, no semantics - queries can be answered only at same level of granularity as that of the service incapsulation of the data in the workflow 5
  • 17. TP query model and new requirements - queries are purely structural, no semantics - queries can be answered only at same level of granularity as that of the service incapsulation of the data in the workflow trace the lineage of values observed here... 5
  • 18. TP query model and new requirements - queries are purely structural, no semantics - queries can be answered only at same level of granularity as that of the service incapsulation of the data in the workflow ...at these points in the workflow trace the lineage of values observed here... 5
  • 19. TP query model and new requirements - queries are purely structural, no semantics - queries can be answered only at same level of granularity as- challenge query 1 “detailed version” seems to require more that of the service incapsulationknowledge on data dependencies thanof the data in thefrom this those obtained workflowstructural model- level of detection ID not available unless query black box inLoadCSVFileIntoTable is opened up- i.e.,CALL SYSCS_UTIL.SYSCS_IMPORT_TABLE (?,?,?,?,?,?,?) ...at these points in the workflow trace the lineage of values observed here... 5
  • 20. Example: query 3Example: query 3query.vars= LoadCSVFileIntoTable / LoadCSVFileIntoTableOutput / 1query.processors=ALL 6
  • 21. Example: query 3Example: query 3query.vars= LoadCSVFileIntoTable / LoadCSVFileIntoTableOutput / 1query.processors=ALL 6
  • 22. Integrated OPM generation as part of TP query 7
  • 23. Integrated OPM generation as part of TP query 7
  • 24. Integrated OPM generation as part of TP query➡ the answer to any TP query can be viewed as an OPM graph➡ encoded as RDF/XML using the Tupelo provenance API. 7
  • 25. MPOD generation rules - IMPOD = Minimal Plausible Originating Dataflow- induced from an OPM graph- The TP query models requires a workflow structure- this is a first approximation...subject to refinement output port R wasGeneratedBy (R) A P P R used (R) P A R P input port R 8
  • 26. MPOD generation rules - IMPOD = Minimal Plausible Originating Dataflow- induced from an OPM graph- The TP query models requires a workflow structure- this is a first approximation...subject to refinement output port R wasGeneratedBy (R) A P P R used (R) P A R P input port R wgb(R5) P1 P2A1 wgb(R1) used(R3) A3 P1 R5 R6 P3 wgb(R6)A2 wgb(R2) used(R4) A4 P2 R3 R4 P3 R1 R2 8
  • 27. MPOD generation rules - II wasDeterminedFrom Derived property: A2 A1 This is usually inferred, i.e. there exist P, R1, R2 such that: wasGeneratedBy (R2) A2 P R1 P used (R1) R2 P A1Note 1: if the corresponding “wgby” and “used” edges are not found,then new P, R1, R2 are created and added to the graph ‣ however, in all cases encountered so far, wasDeterminedFrom was inferred: P, R1, R2 appear in existing wgb(R) and used(R) edgesNote 2: wasControlledBy and wasTriggeredBy ignored for nowNote 3: a separate MPOD is created for each account in the OPM graph 9
  • 28. Importing OPM for the challenge• OPM contributions successfully imported so far: – UC Davis – NCSA (links to PC3 / UoM MPOD wiki page) example! – SOTON• Example (UC Davis) Which operation executions were strictly necessary for the Image table to contain a particular (non-computed) value?query.variables: LoadCSVFileIntoTable:2 / outquery.processors=ALL 10
  • 29. Example -- query 3 result• Ideally, the imported graph + MPOD allow provenance queries to be submitted to the imported graph just as if it were a native TP graph• The answer is viewed as a new OPM graph itself 11
  • 30. Lossless mappings using OPM and MPODProvenance Query: subgraph with query answer only exec → trace(f) Q f TP OPMQ(trace(f)) f, trace(f)Export to OPM: export is just a query exp(trace(f)) that returns the entire traceImport from OPM: import export OPM TP OPMexp(trace(f)) f, trace(f) MPODTP = Taverna Provenance model 12
  • 31. Lossless mappings using OPM and MPODProvenance Query: subgraph with query answer only exec → trace(f) Q f TP OPMQ(trace(f)) f, trace(f)Export to OPM: export is just a query exp(trace(f)) that returns the entire traceImport from OPM: import export OPM TP OPMexp(trace(f)) f, trace(f) MPOD when is this transformation loss-less?TP = Taverna Provenance model 12
  • 32. More on lossless-ness and OPMTavernadataflow exec → trace(f) export f TP OPMexp(trace(f)) f, trace(f) import TP f’, trace(f’) 13
  • 33. More on lossless-ness and OPMTavernadataflow exec → trace(f) export f TP OPMexp(trace(f)) f, trace(f) import export TP f’, trace(f’) 13
  • 34. More on lossless-ness and OPMTavernadataflow exec → trace(f) export f TP OPMexp(trace(f)) f, trace(f) =?= import export TP f’, trace(f’) 13
  • 35. More on lossless-ness and OPM Taverna dataflow exec → trace(f) export f TP OPMexp(trace(f)) f, trace(f) =?= import export TP f’, trace(f’)This is indeed lossless when f is itself a Taverna dataflow:export ( import (export (trace(f)))) =?= export (trace(f))(requires proof) 13

×