SlideShare a Scribd company logo
Scientific Workflow Management System


          Janus
          Provenance


Towards
systema-c
informa-on
exchange

and
reuse
in
e‐laboratories
                                            AGU
Fall
mee-ng,
Dec.
2009
Paolo Missier
  Information Management Group
 School of Computer Science, University of Manchester, UK


         with additional material by Sean Bechhofer and Matthew Gamble,
                             e-Labs design group, University of Manchester
                                                    AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Momentum on sharing and collaboration

Special issue of Nature on Data Sharing (Sept. 2009)




                http://www.nature.com/news/specials/datasharing/index.html




                                                                                       2
                                               AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Momentum on sharing and collaboration

Special issue of Nature on Data Sharing (Sept. 2009)
               • timeliness requires rapid sharing
               • repurposing
               • the Human Genome project use case
                http://www.nature.com/news/specials/datasharing/index.html




                                                                                       2
                                               AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Momentum on sharing and collaboration

 Special issue of Nature on Data Sharing (Sept. 2009)
                          • timeliness requires rapid sharing
                          • repurposing
                          • the Human Genome project use case
                            http://www.nature.com/news/specials/datasharing/index.html




• Debate is much further along in Earth Sciences
  – ESIP - data preservation / stewardship, 2009
  – Long established in some communities - Atmospheric sciences,
    1998 [1]
• Science Commons recommendations for Open Science
  – (July 2008) [link]


[1] Strebel DE, Landis DR, Huemmrich KF, Newcomer JA, Meeson BW: The FIFE Data
Publication Experiment. Journal of the Atmospheric Sciences 1998, 55:1277-1283 2
                                                           AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Collaboration in workflow-based science

        workflow                         workflow
           +                             execution
     input dataset
      specification




                      AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Collaboration in workflow-based science

        workflow                              workflow
           +                                  execution
     input dataset
      specification




                                                outcome
                      outcome                 (provenance)
                       (data)




                           AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Collaboration in workflow-based science

        workflow                              workflow
           +                                  execution
     input dataset
      specification




                                                outcome
                      outcome                 (provenance)
                       (data)




                                                        Research
                                                         Object
                                                        Packaging



                           AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Collaboration in workflow-based science

        workflow                              workflow
           +                                  execution
     input dataset
      specification




                                                outcome
                      outcome                 (provenance)
                       (data)




                                                        Research
                                                         Object
                                                        Packaging



                           AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Collaboration in workflow-based science

                     workflow                              workflow
                        +                                  execution
                  input dataset
                   specification




                                                             outcome
  ul                               outcome
Pa                                  (data)
                                                           (provenance)




        browse                                                       Research
         query                                                        Object
       unbundle                                                      Packaging
         reuse


                                        AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Collaboration in workflow-based science

                     workflow                              workflow
                        +                                  execution
                  input dataset
                   specification




         Data-mediated                                       outcome
  ul         implicit              outcome
Pa        collaboration             (data)
                                                           (provenance)




        browse                                                       Research
         query                                                        Object
       unbundle                                                      Packaging
         reuse


                                        AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Collaboration in workflow-based science


     What is needed for Paul to make sense of third party data?




                  Data-mediated                                   outcome
  ul                  implicit         outcome
Pa                 collaboration        (data)
                                                                (provenance)




                 browse                                                   Research
                  query                                                    Object
                unbundle                                                  Packaging
                  reuse


                                             AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Collaboration in workflow-based science


     What is needed for Paul to make sense of third party data?




                  Data-mediated                                   outcome
  ul                  implicit         outcome
Pa                 collaboration        (data)
                                                                (provenance)




                 browse                                                   Research
                  query                                                    Object
                unbundle                                                  Packaging
                  reuse


                                             AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Collaboration in workflow-based science


     What is needed for Paul to make sense of third party data?




                  Data-mediated                                   outcome
  ul                  implicit         outcome
Pa                 collaboration        (data)
                                                                (provenance)




                 browse                                                   Research
                  query                                                    Object
                unbundle                                                  Packaging
                  reuse


                                             AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Collaboration in workflow-based science


     What is needed for Paul to make sense of third party data?




                  Data-mediated                                   outcome
  ul                  implicit         outcome
Pa                 collaboration        (data)
                                                                (provenance)




                 browse                                                   Research
                  query                                                    Object
                unbundle                                                  Packaging
                  reuse


                                             AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Paul’s

Paul’s
Pack
               QTL


 Research

  Object




              Common pathways

                           AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Paul’s

Paul’s
Pack
                                  QTL


 Research

  Object         Workflow 16

  Results

 Logs           Slides


  Workflow 13            Paper



                Results
                                 Common pathways

                                              AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Paul’s

Paul’s
Pack
                                  QTL


 Research

  Object         Workflow 16

  Results

 Logs           Slides


  Workflow 13            Paper


                                             Representation

                Results
                                 Common pathways

                                              AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Paul’s

Paul’s
Pack
                                  QTL


 Research

  Object         Workflow 16

  Results

 Logs           Slides


  Workflow 13            Paper


                                             Representation

                Results                      Domain Relations


                                 Common pathways

                                              AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Paul’s

  Paul’s
Pack
                                                         QTL


   Research

    Object                     Workflow 16
                            produces
      Results
  Included in             Included in   Published in


    Logs                     Slides
produces
           Feeds into
                 Included in Included in

     Workflow 13                        Paper

                        produces         Published in

                                                                    Representation

                              Results                               Domain Relations


                                                        Common pathways

                                                                     AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Paul’s

  Paul’s
Pack
                                                         QTL


   Research

    Object                     Workflow 16
                            produces
      Results
  Included in             Included in   Published in


    Logs                     Slides
produces
           Feeds into
                 Included in Included in

     Workflow 13                        Paper
 Metadata               produces         Published in

                                                                    Representation

                              Results                               Domain Relations

                                                                    Aggregation
                                                        Common pathways

                                                                      AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
ORE: representing generic aggregations




Resource Map                                        Data structure
(descriptor)



   http://www.openarchives.org/ore/1.0/primer.html section 4
A. Pepe, M. Mayernik, C.L. Borgman, and H.V. Sompel, "From Artifacts to Aggregations:
Modeling Scientific Life Cycles on the Semantic Web," Journal of the American Society for
Information Science and Technology (JASIST), to appear, 2009.




                                                             AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Content: Workflow provenance




A detailed trace of workflow execution
- tasks performed, data transformations
- inputs used, outputs produced




                  AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Content: Workflow provenance




A detailed trace of workflow execution
- tasks performed, data transformations
- inputs used, outputs produced




                  AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Content: Workflow provenance




                                  A detailed trace of workflow execution
        lister
                                  - tasks performed, data transformations
                  get pathways
                   by genes1      - inputs used, outputs produced
                 merge pathways



     gene_id


    concat gene pathway ids

        output




pathway_genes

                                                    AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Why provenance matters, if done right
• To establish quality, relevance, trust
• To track information attribution through complex transformations
• To describe one’s experiment to others, for understanding / reuse
• To provide evidence in support of scientific claims
• To enable post hoc process analysis for improvement, re-design




The W3C Incubator on Provenance has been collecting numerous use cases:
http://www.w3.org/2005/Incubator/prov/wiki/Use_Cases#




                                                    AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
What users expect to learn

                                  • Causal relations:
                                    - which pathways come from which genes?
                                    - which processes contributed to producing an
        lister                          image?
                                    -   which process(es) caused data to be incorrect?
                  get pathways
                   by genes1
                                    -   which data caused a process to fail?

                 merge pathways   • Process and data analytics:
                                    – analyze variations in output vs an input
     gene_id                          parameter sweep (multiple process runs)
                                    – how often has my favourite service been
    concat gene pathway ids           executed? on what inputs?
                                    – who produced this data?
        output
                                    – how often does this pathway turn up when the
                                      input genes range over a certain set S?

pathway_genes
                                                                                   9
                                                           AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Open Provenance Model
• graph of causal dependencies involving data and processors
• not necessarily generated by a workflow!
• v1.1 out soon


      wasGeneratedBy (R)
 A                                P
                                                     Goal:
       used (R)
 P                     A                             standardize causal dependencies
                                           to enable provenance metadata exchange

                                           wgb(R5)
 A1     wgb(R1)        used(R3)       A3             P1
                  P3
                                           wgb(R6)
 A2     wgb(R2)        used(R4)       A4             P2



                                                              AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Additional requirements on OPM
• Artifact values require uniform common identifier
  scheme
  – Linked Data in OPM?

• OPM accounts for structural causal relationships
  – additional domain-specific knowledge required
  – attaching semantic annotations to OPM graph nodes

• OPM graphs can grow very large
  – reduce size by exporting only query results
     • Taverna approach
  – multiple levels of abstraction
     • through OPM accounts (“points of view”)



                                              AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Additional requirements on OPM
• Artifact values require uniform common identifier
  scheme
  – Linked Data in OPM?

• OPM accounts for structural causal relationships
  – additional domain-specific knowledge required
  – attaching semantic annotations to OPM graph nodes

• OPM graphs can grow very large
  – reduce size by exporting only query results
     • Taverna approach
  – multiple levels of abstraction
     • through OPM accounts (“points of view”)



                                              AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Query results as OPM graphs

                                 prov(W)
                                                         execute
    W             run W
                                                         query Q




                                  export              Q(prov(W))
        OPM(Q(prov(W)))
                                 prov(WA)
                                Q(prov(W))




- Approach implemented in the Taverna 2.1 workflow system
- Internal provenance DB with ad hoc query language


               Just released!

                                             AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Full-fledged data-mediated collaborations

 exp. A            workflow A +
                     input A

                     Research
                      Object result
              result    A
                          provenance
             datasets               A
                A




                                  AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Full-fledged data-mediated collaborations

 exp. A            workflow A +
                     input A

                     Research
                      Object result
              result    A
                          provenance
             datasets               A
                A




                                  AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Full-fledged data-mediated collaborations

                     exp. A         workflow A +
                                      input A

                                      Research
                                       Object result
                               result    A
                                           provenance
                              datasets               A
                                 A




result A → input B




                                                   AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Full-fledged data-mediated collaborations

                     exp. A             workflow A +
                                          input A

                                          Research
                                           Object result
                                   result    A
                                               provenance
                                  datasets               A
                                     A




                                        workflow B+
                                          input B

                                          Research
                                           Object result
                         exp. B    result    B
                                               provenance
result A → input B                datasets               B
                                     B



                                                       AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Full-fledged data-mediated collaborations




                workflow A +
                  input A      workflow B +
                                 inputB
     result A → input B
                       Research
        result          Object  result
       datasets result   A+B provenance
           A    datasets            A+B
                   B




                                    AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Full-fledged data-mediated collaborations




                workflow A +
                  input A      workflow B +
                                 inputB
     result A → input B
                       Research
        result          Object  result
       datasets result   A+B provenance
           A    datasets            A+B
                   B

      Provenance composition
       accounts for implicit
          collaboration




                                    AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
Full-fledged data-mediated collaborations




                                      workflow A +
                                        input A       workflow B +
                                                        inputB
                           result A → input B
                                             Research
                              result          Object  result
                             datasets result   A+B provenance
                                 A    datasets              A+B
                                         B

                            Provenance composition
                             accounts for implicit
                                collaboration


Aligned with focus of upcoming Provenance Challenge 4:
“connect my provenance to yours" into a whole OPM provenance graph. - P.Missier
                                                AGU Fall meeting, San Francisco, Dec. 2009
Contacts


The myGrid Consortium (Manchester, Southampton)

                     http://mygrid.org.uk


                     http://www.myexperiment.org



       Janus         Me: pmissier@acm.org
       Provenance




                                  AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier

More Related Content

Viewers also liked

C4Bio paper talk
C4Bio paper talkC4Bio paper talk
C4Bio paper talk
Paolo Missier
 
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
Paolo Missier
 
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paolo Missier
 
ProvAbs: model, policy, and tooling for abstracting PROV graphs
ProvAbs: model, policy, and tooling for abstracting PROV graphsProvAbs: model, policy, and tooling for abstracting PROV graphs
ProvAbs: model, policy, and tooling for abstracting PROV graphs
Paolo Missier
 
Your data won’t stay smart forever: exploring the temporal dimension of (big ...
Your data won’t stay smart forever:exploring the temporal dimension of (big ...Your data won’t stay smart forever:exploring the temporal dimension of (big ...
Your data won’t stay smart forever: exploring the temporal dimension of (big ...
Paolo Missier
 
Big Data Quality Panel : Diachron Workshop @EDBT
Big Data Quality Panel: Diachron Workshop @EDBTBig Data Quality Panel: Diachron Workshop @EDBT
Big Data Quality Panel : Diachron Workshop @EDBT
Paolo Missier
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...
Paolo Missier
 
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralCloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Paolo Missier
 
Paper presentation @DILS'07
Paper presentation @DILS'07Paper presentation @DILS'07
Paper presentation @DILS'07
Paolo Missier
 

Viewers also liked (9)

C4Bio paper talk
C4Bio paper talkC4Bio paper talk
C4Bio paper talk
 
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
 
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
 
ProvAbs: model, policy, and tooling for abstracting PROV graphs
ProvAbs: model, policy, and tooling for abstracting PROV graphsProvAbs: model, policy, and tooling for abstracting PROV graphs
ProvAbs: model, policy, and tooling for abstracting PROV graphs
 
Your data won’t stay smart forever: exploring the temporal dimension of (big ...
Your data won’t stay smart forever:exploring the temporal dimension of (big ...Your data won’t stay smart forever:exploring the temporal dimension of (big ...
Your data won’t stay smart forever: exploring the temporal dimension of (big ...
 
Big Data Quality Panel : Diachron Workshop @EDBT
Big Data Quality Panel: Diachron Workshop @EDBTBig Data Quality Panel: Diachron Workshop @EDBT
Big Data Quality Panel : Diachron Workshop @EDBT
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...
 
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralCloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
 
Paper presentation @DILS'07
Paper presentation @DILS'07Paper presentation @DILS'07
Paper presentation @DILS'07
 

More from Paolo Missier

(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
Paolo Missier
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
Paolo Missier
 
Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
Paolo Missier
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
Paolo Missier
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
Paolo Missier
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
Paolo Missier
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Paolo Missier
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overview
Paolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Paolo Missier
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
Paolo Missier
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Paolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
Paolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
Paolo Missier
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
Paolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Paolo Missier
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Paolo Missier
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
Paolo Missier
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Paolo Missier
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
Paolo Missier
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
Paolo Missier
 

More from Paolo Missier (20)

(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overview
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 

Recently uploaded

"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
DianaGray10
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 

Recently uploaded (20)

"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 

Session talk @ AGU09

  • 1. Scientific Workflow Management System Janus Provenance Towards
systema-c
informa-on
exchange
 and
reuse
in
e‐laboratories AGU
Fall
mee-ng,
Dec.
2009 Paolo Missier Information Management Group School of Computer Science, University of Manchester, UK with additional material by Sean Bechhofer and Matthew Gamble, e-Labs design group, University of Manchester AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 2. Momentum on sharing and collaboration Special issue of Nature on Data Sharing (Sept. 2009) http://www.nature.com/news/specials/datasharing/index.html 2 AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 3. Momentum on sharing and collaboration Special issue of Nature on Data Sharing (Sept. 2009) • timeliness requires rapid sharing • repurposing • the Human Genome project use case http://www.nature.com/news/specials/datasharing/index.html 2 AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 4. Momentum on sharing and collaboration Special issue of Nature on Data Sharing (Sept. 2009) • timeliness requires rapid sharing • repurposing • the Human Genome project use case http://www.nature.com/news/specials/datasharing/index.html • Debate is much further along in Earth Sciences – ESIP - data preservation / stewardship, 2009 – Long established in some communities - Atmospheric sciences, 1998 [1] • Science Commons recommendations for Open Science – (July 2008) [link] [1] Strebel DE, Landis DR, Huemmrich KF, Newcomer JA, Meeson BW: The FIFE Data Publication Experiment. Journal of the Atmospheric Sciences 1998, 55:1277-1283 2 AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 5. Collaboration in workflow-based science workflow workflow + execution input dataset specification AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 6. Collaboration in workflow-based science workflow workflow + execution input dataset specification outcome outcome (provenance) (data) AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 7. Collaboration in workflow-based science workflow workflow + execution input dataset specification outcome outcome (provenance) (data) Research Object Packaging AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 8. Collaboration in workflow-based science workflow workflow + execution input dataset specification outcome outcome (provenance) (data) Research Object Packaging AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 9. Collaboration in workflow-based science workflow workflow + execution input dataset specification outcome ul outcome Pa (data) (provenance) browse Research query Object unbundle Packaging reuse AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 10. Collaboration in workflow-based science workflow workflow + execution input dataset specification Data-mediated outcome ul implicit outcome Pa collaboration (data) (provenance) browse Research query Object unbundle Packaging reuse AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 11. Collaboration in workflow-based science What is needed for Paul to make sense of third party data? Data-mediated outcome ul implicit outcome Pa collaboration (data) (provenance) browse Research query Object unbundle Packaging reuse AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 12. Collaboration in workflow-based science What is needed for Paul to make sense of third party data? Data-mediated outcome ul implicit outcome Pa collaboration (data) (provenance) browse Research query Object unbundle Packaging reuse AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 13. Collaboration in workflow-based science What is needed for Paul to make sense of third party data? Data-mediated outcome ul implicit outcome Pa collaboration (data) (provenance) browse Research query Object unbundle Packaging reuse AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 14. Collaboration in workflow-based science What is needed for Paul to make sense of third party data? Data-mediated outcome ul implicit outcome Pa collaboration (data) (provenance) browse Research query Object unbundle Packaging reuse AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 15. Paul’s
 Paul’s
Pack QTL Research
 Object Common pathways AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 16. Paul’s
 Paul’s
Pack QTL Research
 Object Workflow 16 Results Logs Slides Workflow 13 Paper Results Common pathways AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 17. Paul’s
 Paul’s
Pack QTL Research
 Object Workflow 16 Results Logs Slides Workflow 13 Paper Representation Results Common pathways AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 18. Paul’s
 Paul’s
Pack QTL Research
 Object Workflow 16 Results Logs Slides Workflow 13 Paper Representation Results Domain Relations Common pathways AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 19. Paul’s
 Paul’s
Pack QTL Research
 Object Workflow 16 produces Results Included in Included in Published in Logs Slides produces Feeds into Included in Included in Workflow 13 Paper produces Published in Representation Results Domain Relations Common pathways AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 20. Paul’s
 Paul’s
Pack QTL Research
 Object Workflow 16 produces Results Included in Included in Published in Logs Slides produces Feeds into Included in Included in Workflow 13 Paper Metadata produces Published in Representation Results Domain Relations Aggregation Common pathways AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 21. ORE: representing generic aggregations Resource Map Data structure (descriptor) http://www.openarchives.org/ore/1.0/primer.html section 4 A. Pepe, M. Mayernik, C.L. Borgman, and H.V. Sompel, "From Artifacts to Aggregations: Modeling Scientific Life Cycles on the Semantic Web," Journal of the American Society for Information Science and Technology (JASIST), to appear, 2009. AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 22. AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 23. Content: Workflow provenance A detailed trace of workflow execution - tasks performed, data transformations - inputs used, outputs produced AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 24. Content: Workflow provenance A detailed trace of workflow execution - tasks performed, data transformations - inputs used, outputs produced AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 25. Content: Workflow provenance A detailed trace of workflow execution lister - tasks performed, data transformations get pathways by genes1 - inputs used, outputs produced merge pathways gene_id concat gene pathway ids output pathway_genes AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 26. Why provenance matters, if done right • To establish quality, relevance, trust • To track information attribution through complex transformations • To describe one’s experiment to others, for understanding / reuse • To provide evidence in support of scientific claims • To enable post hoc process analysis for improvement, re-design The W3C Incubator on Provenance has been collecting numerous use cases: http://www.w3.org/2005/Incubator/prov/wiki/Use_Cases# AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 27. What users expect to learn • Causal relations: - which pathways come from which genes? - which processes contributed to producing an lister image? - which process(es) caused data to be incorrect? get pathways by genes1 - which data caused a process to fail? merge pathways • Process and data analytics: – analyze variations in output vs an input gene_id parameter sweep (multiple process runs) – how often has my favourite service been concat gene pathway ids executed? on what inputs? – who produced this data? output – how often does this pathway turn up when the input genes range over a certain set S? pathway_genes 9 AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 28. Open Provenance Model • graph of causal dependencies involving data and processors • not necessarily generated by a workflow! • v1.1 out soon wasGeneratedBy (R) A P Goal: used (R) P A standardize causal dependencies to enable provenance metadata exchange wgb(R5) A1 wgb(R1) used(R3) A3 P1 P3 wgb(R6) A2 wgb(R2) used(R4) A4 P2 AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 29. Additional requirements on OPM • Artifact values require uniform common identifier scheme – Linked Data in OPM? • OPM accounts for structural causal relationships – additional domain-specific knowledge required – attaching semantic annotations to OPM graph nodes • OPM graphs can grow very large – reduce size by exporting only query results • Taverna approach – multiple levels of abstraction • through OPM accounts (“points of view”) AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 30. Additional requirements on OPM • Artifact values require uniform common identifier scheme – Linked Data in OPM? • OPM accounts for structural causal relationships – additional domain-specific knowledge required – attaching semantic annotations to OPM graph nodes • OPM graphs can grow very large – reduce size by exporting only query results • Taverna approach – multiple levels of abstraction • through OPM accounts (“points of view”) AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 31. Query results as OPM graphs prov(W) execute W run W query Q export Q(prov(W)) OPM(Q(prov(W))) prov(WA) Q(prov(W)) - Approach implemented in the Taverna 2.1 workflow system - Internal provenance DB with ad hoc query language Just released! AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 32. Full-fledged data-mediated collaborations exp. A workflow A + input A Research Object result result A provenance datasets A A AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 33. Full-fledged data-mediated collaborations exp. A workflow A + input A Research Object result result A provenance datasets A A AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 34. Full-fledged data-mediated collaborations exp. A workflow A + input A Research Object result result A provenance datasets A A result A → input B AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 35. Full-fledged data-mediated collaborations exp. A workflow A + input A Research Object result result A provenance datasets A A workflow B+ input B Research Object result exp. B result B provenance result A → input B datasets B B AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 36. Full-fledged data-mediated collaborations workflow A + input A workflow B + inputB result A → input B Research result Object result datasets result A+B provenance A datasets A+B B AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 37. Full-fledged data-mediated collaborations workflow A + input A workflow B + inputB result A → input B Research result Object result datasets result A+B provenance A datasets A+B B Provenance composition accounts for implicit collaboration AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier
  • 38. Full-fledged data-mediated collaborations workflow A + input A workflow B + inputB result A → input B Research result Object result datasets result A+B provenance A datasets A+B B Provenance composition accounts for implicit collaboration Aligned with focus of upcoming Provenance Challenge 4: “connect my provenance to yours" into a whole OPM provenance graph. - P.Missier AGU Fall meeting, San Francisco, Dec. 2009
  • 39. Contacts The myGrid Consortium (Manchester, Southampton) http://mygrid.org.uk http://www.myexperiment.org Janus Me: pmissier@acm.org Provenance AGU Fall meeting, San Francisco, Dec. 2009 - P.Missier