Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
D-PROV: Extending the PROV Provenance Model                  to express Workflow StructurePaolo Missier(1), Saumen Dey(2),...
Queries that require provenance    Q1: “track the lineage of the final outputs of the workflow”    Q2: “list the parameter v...
PROV @W3C: scope and structure    Recommendation    track3             source: http://www.w3.org/TR/prov-overview/
r-prov and p-prov in plain PROV              T1            prov:type= "prov:plan"          T2           p-prov        wasA...
Reference dataflow models    wf                                                                     Processors, ports, dat...
Extensions I / p-prov / ports model                               wf                                           op1        ...
Extensions II / p-prov / channel model                                        wf                                          ...
p-prov/r-prov pattern for port-oriented workflows                                        hasOutputPort                    ...
p-prov/r-prov pattern for port-oriented workflows                                         hasOutputPort                   ...
p-prov/r-prov pattern for channel-oriented workflows                                T1                                    ...
p-prov/r-prov pattern for channel-oriented workflows                                T1                                    ...
Bundles, provenance of provenance     A bundle is a named set of provenance descriptions, and is itself an entity,     so ...
Structural workflow nesting using bundles     Repurposing: use bundles to associate a workflow execution with the     prov...
Answering the sample queries     Q3: “match the provenance trace to the workflow structure”     Two steps:     - define rule...
Structure inferences            T1           OP     wasAssociatedWith                                     hasOutPort(T, OP...
Structure inferences                     isTaskOf(T2, Wf) :−             onOutPort(D,OP,I1),                              ...
composed by three main parts: the structure of the experiment        represents the execution of some activity instance (a...
Summary and extensions     • Simple extensions to PROV       – designed to model p-prov       – complementary to r-prov   ...
Upcoming SlideShare
Loading in …5
×

Tapp 13-talk

439 views

Published on

talk at TAPP'13 for the paper:
https://www.usenix.org/system/files/conference/tapp13/tapp13-final3.pdf

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Tapp 13-talk

  1. 1. D-PROV: Extending the PROV Provenance Model to express Workflow StructurePaolo Missier(1), Saumen Dey(2), Khalid Belhajjame(3), Vıctor Cuevas-Vicentt́ın(2), Bertram Ludaescher(2) (1) Newcastle University, UK (2) UC Davis, CA, USA (3) University of Manchester, UK TAPP’13 Lombard, IL. April, 2013
  2. 2. Queries that require provenance Q1: “track the lineage of the final outputs of the workflow” Q2: “list the parameter values that were used for a specific task t in the workflow” Q3: “check that the provenance traces conform to the structure of the workflow” Prospective Provenance (p-prov): - representation of the workflow itself; Retrospective Provenance (r-prov): - provenance of the data produced by one workflow run Provenance of the Process: - account of the evolution of the workflow across versions FREIRE, J., KOOP, D., SANTOS, E., AND SILVA, C. T. Provenance for Computational Tasks: A Survey. Computing2 in Science and Engineering 10, 3 (2008), 11–21.
  3. 3. PROV @W3C: scope and structure Recommendation track3 source: http://www.w3.org/TR/prov-overview/
  4. 4. r-prov and p-prov in plain PROV T1 prov:type= "prov:plan" T2 p-prov wasAssociatedWith wasAssociatedWith wasGeneratedBy used T1Inv d T2Inv r-prov // p-prov: tasks, but no data or activities entity(T1, [ prov:type = prov:plan]) entity(T2, [ prov:type = prov:plan]) // r-prov - task invocation and data activity(T1inv) activity(T2inv) entity(d) // data flowing between two task instances wasGeneratedBy(d, T1inv) used(T2inv, d) // connecting r-prov and p-prov wasAssociatedWith(T1inv, _, T1) // T1 is the plan for T1inv wasAssociatedWith(T2inv, _, T2) // T1 is the plan for T2inv4
  5. 5. Reference dataflow models wf Processors, ports, data links op1 ip1 Kepler T1 T2 op2 ip2 Taverna VisTrails e-Science Central ... wf Processors, channels T1 ch1 T2 ch2 Dataflow process networks (*) (e.g. a specific Kepler semantics)5 (*) LEE, E. A., AND PARKS, T. M. Dataflow process networks. Proceedings of the IEEE 83, 5 (1995), 773–801.
  6. 6. Extensions I / p-prov / ports model wf op1 ip1 T1 T2 op2 ip2 prov:type= "prov:plan" prov:type= "D1:task" hasOutputPort T1 op1 hasOutPort(T1, op1) hasInPort(T2, ip1) isTaskOf dataLink(op1, ip1) isTaskOf(wf, T1) prov:type= dataLink prov:type= "D1:port" wf isTaskOf(wf, T2) "D1:workflow" isTaskOf hasInputPort T2 ip1 prov:type= "prov:plan" prov:type= "D1:task"6
  7. 7. Extensions II / p-prov / channel model wf T1 ch1 T2 ch2 prov:type= "prov:plan" prov:type= "D1:task" isSourceOf T1 ch2 isTaskOf prov:type= "D1:workflow", prov:type= "D1:channel" prov:type= "prov:plan" wf isTaskOf isSourceOf isSourceOf(T1,ch2) isSinkOf isSourceOf(T1,ch1) T2 ch1 isSinkOf(T2,ch1) prov:type= "prov:plan" isTaskOf(T1, wf) prov:type= "D1:task" isTaskOf(T2, wf)7
  8. 8. p-prov/r-prov pattern for port-oriented workflows hasOutputPort T1 op1 hasOutPort(t1, op1) wasAssociatedWith hasInPort(t2, ip1) T1Inv dataLink(op1, ip1) isTaskOf onOutPort isTaskOf(wf, t1) wasStartedBy isTaskOf(wf, t2) wasAssociatedWith d dataLink wf wfInv activity (wfRun) wasStartedBy activity ( t1inv ) isTaskOf onInPort activity ( t2inv ) T2Inv entity (d) wasAssociatedWith onOutPort(d, op1, t1Inv) hasInputPort onInPort(d, ip1, t2Inv) T2 ip18
  9. 9. p-prov/r-prov pattern for port-oriented workflows hasOutputPort T1 op1 hasOutPort(t1, op1) wasAssociatedWith hasInPort(t2, ip1) T1Inv dataLink(op1, ip1) isTaskOf onOutPort isTaskOf(wf, t1) wasStartedBy isTaskOf(wf, t2) wasAssociatedWith d dataLink wf wfInv activity (wfRun) wasStartedBy activity ( t1inv ) isTaskOf onInPort activity ( t2inv ) T2Inv entity (d) wasAssociatedWith onOutPort(d, op1, t1Inv) hasInputPort onInPort(d, ip1, t2Inv) T2 ip1 Lossy mapping to plain PROV: Port removal wasGeneratedBy(D, tInv) :− onOutPort(D, _, tInv ). used(tInv, D) :− onInPort(D, _, tInv)8
  10. 10. p-prov/r-prov pattern for channel-oriented workflows T1 isSourceOf wasAssociatedWith sourceOf(t1,ch) T1Inv sinkOf(t2,ch) isTaskOf isTaskOf(t1, wf) wasReadFrom isTaskOf(t2, wf) wasStartedBy activity (wfRun) wasAssociatedWith d wf wfInv ch1 activity ( t1inv ) activity ( t2inv ) wasStartedBy wasWrittenTo entity (d) isTaskOf T2Inv wasWrittenTo(d,ch, t1Inv) wasReadFrom(d,ch, t2Inv) wasAssociatedWith isSinkOf T29
  11. 11. p-prov/r-prov pattern for channel-oriented workflows T1 isSourceOf wasAssociatedWith sourceOf(t1,ch) T1Inv sinkOf(t2,ch) isTaskOf isTaskOf(t1, wf) wasReadFrom isTaskOf(t2, wf) wasStartedBy activity (wfRun) wasAssociatedWith d wf wfInv ch1 activity ( t1inv ) activity ( t2inv ) wasStartedBy wasWrittenTo entity (d) isTaskOf T2Inv wasWrittenTo(d,ch, t1Inv) wasReadFrom(d,ch, t2Inv) wasAssociatedWith isSinkOf T2 Lossy mapping to plain PROV: Channel removal: wasGeneratedBy(d, tInv) :− wasWrittenTo(d,ch, t1Inv ) used(tInv, D):−wasReadFrom(d,ch,t2Inv)9
  12. 12. Bundles, provenance of provenance A bundle is a named set of provenance descriptions, and is itself an entity, so allowing provenance of provenance to be expressed. pm:bundle1 draft used wasGeneratedBy draft commenting v1 comments bundle pm:bundle1 entity(ex:draftComments) entity(ex:draftV1) activity(ex:commenting) wasGeneratedBy(ex:draftComments, ex:commenting,-) used(ex:commenting, ex:draftV1, -) endBundle ... entity(pm:bundle1, [ prov:type=prov:Bundle ]) wasGeneratedBy(pm:bundle1, -, 2013-03-20T10:30:00) wasAttributedTo(pm:bundle1, ex:Bob)10
  13. 13. Structural workflow nesting using bundles Repurposing: use bundles to associate a workflow execution with the provenance it generates entity (wfRunTrace, [ prov:type=’prov:Bundle’ ]) wasGeneratedBy(wfRunTrace,wfInv,-) This makes it possible to write hierarchical provenance of nested workflows, recursively: entity (T, [prov:type="D1:task", prov:type="D1:workflow"]) bundle wfRunTrace activity(wfRun) // run of top level wf activity(Tinv) // run of T, a sub-workflow wasAssociatedWith(Tinv, _, T) entity(TinvTrace, [ prov:type=prov:Bundle ]) wasGeneratedBy(TinvTrace, Tinv, _) ... endbundle11
  14. 14. Answering the sample queries Q3: “match the provenance trace to the workflow structure” Two steps: - define rules that entail p-prov relations from r-prov relations, and - check that those new p-prov relations are consistent with any constraints defined on the workflow structure / infer new p-prov statements dataLink (OP, IP) :− onOutPort(D,OP,_), onInPort(D,IP,_). OP wf op1 ip1 onOutPort T1 T2 op2 ip2 D dataLink onInPort constraint violation? IP12
  15. 15. Structure inferences T1 OP wasAssociatedWith hasOutPort(T, OP) :− I1 onOutPort(D,OP,I1), onOutPort wasAssociatedWith(I1,_,T1). D onInPort hasInPort(T, IP) :− I2 onInPort(D,IP,I1), wasAssociatedWith wasAssociatedWith(I1,_,T1). T2 IP13
  16. 16. Structure inferences isTaskOf(T2, Wf) :− onOutPort(D,OP,I1), hasOutPort(T1, OP), onInPort(D,IP,I2), hasInPort(T2, IP), T1 OP wasAssociatedWith(I1,_,T1), wasAssociatedWith wasAssociatedWith(I2,_,T2), isTaskOf(T1, Wf). I1 isTaskOf onOutPort wf D isTaskOf onInPort I2 wasAssociatedWith T2 IP14
  17. 17. composed by three main parts: the structure of the experiment represents the execution of some activity instance (activity Other PROV extensions into “workflow-land” (white classes in the UML class diagram), execution of the Execute activity), while the second defines the execution experiment (dark gray classes) and environment configuration properties of one SWfMS (activity Execute workflow). Also, the (light gray classes). Furthermore, the stereotypes in UML class execution of an activity instance determines which parameters are diagram are used to represent PROV components. consumed by each instance of the activity. The agent Scientist represents a person to use computational By using PROV-Wf we are able to represent provenance data to resources to execute the experiment (composed as a workflow). be distributed and queried at runtime. In the next sub-section we Prov-Wf: Also, the agent Scientist is associated to a machine (i.e. agent Machine). The agent Machine establishes an association with a present how this provenance model is coupled to a set of components developed for capturing and querying runtime workflow (plan Workflow), which is composed by a set of provenance data. Flavio Costa, Vítor Silva, Daniel de Oliveira, Kary Ocaña, Eduardo Ogasawara, Jonas Dias, and Capturing Provenance 3.2 Components for Marta Mattoso, activities (i.e. plan WActivity). Each activity is responsible for executing a program in a machine with a specific configuration. Capturing and Querying Workflow Runtime Provenance with PROV: a Practical Approach, Procs.components to import components We have developed a series of BigProv’13,provenance The invocation of a program within a workflow (i.e. execution data from different SWfMS to PROV-Wf model. Our instance) uses a set of parameters that can be seen as a set of Genova, Italy, March 2013 Plan DataLink hasDataLink Workflow hasSubProcess hasSink hasSource Input hasInput Process Output hasOutput Figure 1 PROV-Wf data model used Parameter describedByProcess usedInput describedByParameter Artifact ProcessRun wasEnactedBy WfProv from the Wf4Ever project WorkflowEngine wasOutputFrom Entity Activity www.wf4ever-project.org wasGeneratedBy SoftwareAgent15
  18. 18. Summary and extensions • Simple extensions to PROV – designed to model p-prov – complementary to r-prov • They enable queries that cut across r-prov and p-prov • Bundle mechanism used for provenance of nested workflow components • Next step: harmonize similar extensions proposed by other groups16 – overall goal is to achieve interoperability

×