Workflows, experimental findings, and their provenance:towards semantically rich linked data and method sharing           ...
IDCC ’11 - P.Missier et al.   Prologue: DCC “REPRISE” workshop, 2009 2
IDCC ’11 - P.Missier et al.   “Virtual experimental science” (DCC’09)                                        [MLB+10] 3
Outline    1. Workflow and their provenance      – Janus: workflows + provenance + semantics for          • (+ LOD) [MLB+1...
1: Workflows and provenance    - The Janus provenance ontology for Taverna    - The W3C PROV ontology    - PROV-W5
Example workflow (Taverna) chr: 17                                       QTL → start: 28500000 end: 3000000               ...
Baseline provenance of a workflow run                              QTL →                                mmu:12575         ...
Janus: a semantic provenance model                                                                                        ...
Janus: a semantic provenance model                                                                                        ...
Annotated provenance          Annotated workflow                                                    Annotated provenance g...
From structure annotations to values annotations      [MWT+10]                                                            ...
From structure annotations to values annotations      [MWT+10]                                                            ...
From structure annotations to values annotations                                                                          ...
From structure annotations to values annotations                                                                          ...
Annotations propagation rules                        X rdf:type Port    C = {c}    X has value type c                     ...
Annotations propagation rules                        X rdf:type Port    C = {c}    X has value type c                     ...
Annotations propagation rules                        X rdf:type Port    C = {c}    X has value type c                     ...
Annotations as semantic overlay                                                v1                                         ...
Extensions to Linked Data          Annotated workflow                                            Annotated provenance grap...
Extensions to Linked Data          Annotated workflow                                            Annotated provenance grap...
I - Mapping data values to LoD URIs   In our prototype we map data values to Bio2RDF as follows:                          ...
The PROV ontology from the W3C     • Plus a growing catalogue of examples from group members:                  http://www....
PROV Ontology (PROV-O)     • “PROV-O defines the normative OWL2 Web Ontology Language encoding       of the PROV Data Mode...
PROV-O encoding: simple example     • Examples drawn from the PROV Primer document                                     PRO...
Plans in PROV-O     • PROV deliberately does not deal with program structure       – workflow, processor, port           e...
One possible rendering of the Janus example            v1                                                  w1             ...
PROV-W: Workflow pattern with annotations                                               used          Pact        wasGenBy...
The complete suite of PROV specifications21
The complete suite of PROV specifications21
The complete suite of PROV specifications21
The complete suite of PROV specifications21
2: Packaging and sharing:                data + methods + provenance     - The Research Object model from Wf4Ever       wi...
Research Objects                                                             s                                            ...
Example24
Research Objects as ORE resources     • ORE = Object Exchange and Reuse       – a small vocabulary and patterns for modell...
Annotations26
Wf4Ever project partners     Acknowledgement27                               21
DataONE: Preservation of Observational Data     Data"Life"Cycle"Tool"Support"                                  Collect"   ...
Components"     Three"components"for"a"           Member&Nodes&     flexible,"scalable,"sustainable"   •  diverse"ins=tu=on...
Data"Model"                 Package"                                    System"                                System"    ...
Data packages in DataONE     Package Content Associations Using OAI-ORE                                                   ...
Summary: Putting it all together     •   Research objects fit well with DataONE packages     •   Workflow and provenance f...
References     [MLB+10] Missier, Paolo, Bertram Ludascher, Shawn Bowers, Manish Kumar Anand, Ilkay     Altintas, Saumen De...
Upcoming SlideShare
Loading in …5
×

Workflows, experimental findings, and their provenance: towards semantically rich linked data and method sharing for collaborative science

1,176 views

Published on

Credible workshop, Sophia-Antipolis, Oct. 2012
invited talk (Johan Montagnat)

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,176
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • \n
  • REPRISE: Repository Preservation Infrastructure\n
  • \n
  • \n
  • \n
  • On a typical QTL region with ~150k base pairs, one execution of this workflow finds about 50 Ensembl genes. \nThese could correspond to about 30 genes in the UniProt database, and 60 in the NCBI Etrez genes database. \nEach gene may be involved in a number of pathways, for example the mouse genes Mapk13 (mmu:26415) and \nCdkn1a (mmu:12575) participate to 13 and 9 pathways, respectively. \n\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • comments\nhard for users to formulate a suitable provenance query on scenario I\ncan be done visually but still good understanding of the process and its relation to data is needed\n
  • comments\nhard for users to formulate a suitable provenance query on scenario I\ncan be done visually but still good understanding of the process and its relation to data is needed\n
  • comments\nhard for users to formulate a suitable provenance query on scenario I\ncan be done visually but still good understanding of the process and its relation to data is needed\n
  • comments\nhard for users to formulate a suitable provenance query on scenario I\ncan be done visually but still good understanding of the process and its relation to data is needed\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • An agent can be a person, a piece of software, an inanimate object, an organization, or other entities that may be ascribed responsibility.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • exists: Data packages\nin progress: provenance-enhanced data packages through the provWG\n- generating, storing, and retrieving data and methods with provenance-specific search capability\n
  • \n
  • \n
  • Workflows, experimental findings, and their provenance: towards semantically rich linked data and method sharing for collaborative science

    1. 1. Workflows, experimental findings, and their provenance:towards semantically rich linked data and method sharing for collaborative science Paolo Missier Paolo.Missier@ncl.ac.uk Newcastle University, UK Credible workshop Sophia-Antipolis October, 2012
    2. 2. IDCC ’11 - P.Missier et al. Prologue: DCC “REPRISE” workshop, 2009 2
    3. 3. IDCC ’11 - P.Missier et al. “Virtual experimental science” (DCC’09) [MLB+10] 3
    4. 4. Outline 1. Workflow and their provenance – Janus: workflows + provenance + semantics for • (+ LOD) [MLB+12, ZSM+11] – The PROV Semantic data model for provenance (ongoing) – PROV-W: unofficial workflow extension (and semantic annotations) 2. Packaging and sharing: data + methods + provenance – Research Objects in the project – and Data Packages4
    5. 5. 1: Workflows and provenance - The Janus provenance ontology for Taverna - The W3C PROV ontology - PROV-W5
    6. 6. Example workflow (Taverna) chr: 17 QTL → start: 28500000 end: 3000000 Ensembl Genes Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene Uniprot Gene → Entrez Gene → Kegg Gene Kegg Gene merge gene IDs Gene → Pathway path:mmu04210 Apoptosis, path:mmu04010 MAPK, ...Janus -- IPAW, Troy, NY, June 15-17, 2010
    7. 7. Baseline provenance of a workflow run QTL → mmu:12575 Ensembl Genes v1 ... vn w Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene path:mmu04012 exec Uniprot Gene → Entrez Gene → a1 ... an b1 ... bm Kegg Gene Kegg Gene mmu:26416 merge gene IDs path:mmu04010 y11 ymn Gene → Pathway ... path:mmu04010→derivedFrom→mmu:26416 path:mmu04012→derivedFrom→mmu:12575 • The graph encodes all direct data dependency relations • Baseline query model: compute paths amongst sets of nodes • Transitive closure over data dependency relations 7Janus -- IPAW, Troy, NY, June 15-17, 2010
    8. 8. Janus: a semantic provenance model 8Janus -- IPAW, Troy, NY, June 15-17, 2010 http://homepages.cs.ncl.ac.uk/paolo.missier/doc/janus.owl
    9. 9. Janus: a semantic provenance model port v1 v2 v3 value X1 X2 X3 X1 X2 X3 port P exec P_inst Y1 Y2 Y1 Y2 processor processor exec spec w1 w2 8Janus -- IPAW, Troy, NY, June 15-17, 2010 http://homepages.cs.ncl.ac.uk/paolo.missier/doc/janus.owl
    10. 10. Annotated provenance Annotated workflow Annotated provenance graph QTL → Gene Ensembl Genes Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene exec ... Uniprot Gene → Entrez Gene → Kegg Gene Kegg Gene merge gene IDs Kegg Gene → Pathway Find all genes within the input QTL region that are involved in a given KEGG pathway. List relevant PubMed publications for the pathways listed in the result set. 9Janus -- IPAW, Troy, NY, June 15-17, 2010
    11. 11. From structure annotations to values annotations [MWT+10] 10Janus -- IPAW, Troy, NY, June 15-17, 2010
    12. 12. From structure annotations to values annotations [MWT+10] 10Janus -- IPAW, Troy, NY, June 15-17, 2010
    13. 13. From structure annotations to values annotations protein sequence X1 X2 X3 interpro P match report Y1 Y2 processor interpro spec scan [MWT+10] 10Janus -- IPAW, Troy, NY, June 15-17, 2010
    14. 14. From structure annotations to values annotations protein sequence X1 X2 X3 interpro P match report Y1 Y2 processor interpro spec scan [MWT+10] protein sequence port v1 v2 v3 value X1 exec X2 X3 port P_inst interpro Y1 Y2 processor interpro scan exec w1 w2 match report 10Janus -- IPAW, Troy, NY, June 15-17, 2010
    15. 15. Annotations propagation rules X rdf:type Port C = {c} X has value type c X has value v v rdf:type PortValue v rdf:type C denotes data type in the PL protein sense has_value_type sequence X1 X2 X3 interpro P match report Y1 Y2 processor interpro spec scan 11Janus -- IPAW, Troy, NY, June 15-17, 2010
    16. 16. Annotations propagation rules X rdf:type Port C = {c} X has value type c X has value v v rdf:type PortValue v rdf:type C denotes data type in the PL protein sense has_value_type sequence X1 X2 X3 interpro P match report Y1 Y2 processor ? interpro spec port scan v1 v2 v3 value X1 X2 X3 port P_inst interpro Y1 Y2 processor interpro scan exec w1 w2 match report 11Janus -- IPAW, Troy, NY, June 15-17, 2010
    17. 17. Annotations propagation rules X rdf:type Port C = {c} X has value type c X has value v v rdf:type PortValue v rdf:type C denotes data type in the PL protein sense has_value_type sequence X1 X2 X3 interpro P match report Y1 Y2 protein processor sequence interpro spec port scan v1 v2 v3 value X1 X2 X3 port P_inst interpro Y1 Y2 processor interpro scan exec w1 w2 match report 11Janus -- IPAW, Troy, NY, June 15-17, 2010
    18. 18. Annotations as semantic overlay v1 w1 X3 Y2 Provenance graph has_port_value has_port_value X2 P fragment Y1 X1 vn wm has-input-type Pathway has-output-type search service instance-of instance-of instance-of Gene v1 w1 Pathway has-source X3 has-source has_port_value Y2 has_port_value X2 P instance-of instance-of Y1 Kegg wm X1 vn Kegg has-source has-source 12Janus -- IPAW, Troy, NY, June 15-17, 2010
    19. 19. Extensions to Linked Data Annotated workflow Annotated provenance graph QTL → Ensembl Genes Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene exec ... Uniprot Gene → Entrez Gene → Kegg Gene Kegg Gene merge gene IDs Gene → Pathway 13Janus -- IPAW, Troy, NY, June 15-17, 2010
    20. 20. Extensions to Linked Data Annotated workflow Annotated provenance graph QTL → Ensembl Genes Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene exec ... Uniprot Gene → Entrez Gene → Kegg Gene Kegg Gene merge gene IDs Gene → Pathway - Publish - I - Map IDs - II - query 13Janus -- IPAW, Troy, NY, June 15-17, 2010
    21. 21. I - Mapping data values to LoD URIs In our prototype we map data values to Bio2RDF as follows: Entrez Genes Uniprot Genes KEGG Genes KEGG Pathways Linked Data Query example: List relevant PubMed publications for the pathways listed in the workflow result set 14Janus -- IPAW, Troy, NY, June 15-17, 2010
    22. 22. The PROV ontology from the W3C • Plus a growing catalogue of examples from group members: http://www.w3.org/2011/prov/wiki/PROV_examples15
    23. 23. PROV Ontology (PROV-O) • “PROV-O defines the normative OWL2 Web Ontology Language encoding of the PROV Data Model” [1] http://dvcs.w3.org/hg/prov/raw-file/default/ontology/ProvenanceOntology.owl state as of Oct., 2012: Last Call -- now closed to public comments Core elements: [1] Current version: http://www.w3.org/TR/prov-o/16
    24. 24. PROV-O encoding: simple example • Examples drawn from the PROV Primer document PROV-N: used(ex:compose, ex:dataSet1, -) used(ex:compose, ex:regionList, -) wasGeneratedBy(ex:composition, ex:compose, -) used(ex:illustrate, ex:composition, -) PROV-O (Turtle): wasGeneratedBy(ex:chart1, ex:illustrate, -) ex:compose a prov:Activity; prov:used ex:dataSet1 ; prov:used ex:regionList . ex:composition a prov:Entity; prov:wasGeneratedBy ex:compose . ex:illustrate a prov:Activity; prov:used ex:composition . ex:chart1 a prov:Entity; prov:wasGeneratedBy ex:illustrate .17 [2] Current editor’s draft: http://dvcs.w3.org/hg/prov/raw-file/default/primer/Primer.html
    25. 25. Plans in PROV-O • PROV deliberately does not deal with program structure – workflow, processor, port ex:correct a prov:Activity; prov:used ex:dataSet1. ex:edith a prov:Agent, prov:Person . ex:instructions a prov:Plan . ex:correct prov:qualifiedAssociation [ a Association ; prov:agent ex:edith ; prov:hadPlan ex:instructions ]. ex:dataSet2 prov:wasGeneratedBy ex:correct . ex:dataSet2 prov:wasRevisionOf ex:dataSet1 .18
    26. 26. One possible rendering of the Janus example v1 w1 X3 Y2 has_port_value has_port_value X2 P Y1 wm X1 vn used Pact wasGenBy V1 W1 wf:port = "X1" wf:port = "Y1" wasAssociatedWith hadPlan prov:type = prov:plan P-exec P19
    27. 27. PROV-W: Workflow pattern with annotations used Pact wasGenBy V1 W1 wf:port = "X1" wf:port = "Y1" :P a prov:Plan, prov:Entity; a PathwaySearchService. wasAssociatedWith prov:type = prov:plan :P_exec a prov:Agent. :P_act a prov:Activity; prov:qualifiedAssociation [ a prov:Association; P-exec P prov:agent :P_exec; prov:hadPlan :P; ]; prov:qualifiedUsage [ a prov:Usage ; prov:Entity :V1; has-input-type Pathway search wf:port “X1” ]. service :V1 a prov:Entity; instance-of a :Gene; :hasSource :Kegg. instance-of :W1 a prov:Entity; Gene v1 prov:qualifiedGeneration [ X1 X2 X3 has-source Y2 a prov:Generation; has_port_value P instance-of prov:activity :P_act; Y1 Kegg wf:port “Y1” vn ]; has-source a :Pathway; :hasSource :Kegg.20
    28. 28. The complete suite of PROV specifications21
    29. 29. The complete suite of PROV specifications21
    30. 30. The complete suite of PROV specifications21
    31. 31. The complete suite of PROV specifications21
    32. 32. 2: Packaging and sharing: data + methods + provenance - The Research Object model from Wf4Ever with: Khalid Belhajjame, University of Manchester - The DataONE data preservation architecture22
    33. 33. Research Objects s RO model specification: ow Scientists or k fl http://wf4ever.github.com/ro/ W Hypothesis Experiments RO primer: Research http://wf4ever.github.com/ro- Object primer/ Electronic Results paper Provenance Datasets23
    34. 34. Example24
    35. 35. Research Objects as ORE resources • ORE = Object Exchange and Reuse – a small vocabulary and patterns for modelling generic “aggregation”25
    36. 36. Annotations26
    37. 37. Wf4Ever project partners Acknowledgement27 21
    38. 38. DataONE: Preservation of Observational Data Data"Life"Cycle"Tool"Support" Collect" Analyze" Assure" DMP-Tool! Integrate" Describe! Kepler Discover" Deposit" Preserve" 3"28
    39. 39. Components" Three"components"for"a" Member&Nodes& flexible,"scalable,"sustainable" •  diverse"ins=tu=ons" Coordina/ng&Nodes& network" •  serve"local"community" •  retain"complete"metadata" Inves/gator&Toolkit& •  provide"resources"for" catalog"" managing"their"data" •  indexing"for"search" •  retain"copies"of"data" •  network@wide"services" •  ensure"content" availability"(preserva=on)""" •  replica=on"services"29 6"
    40. 40. Data"Model" Package" System" System" Metadata" Metadata" Science" Data" Metadata" Any"data"object" XML"documents:" ISO19115,"EML,"FGDC,"…" Granule:"" •  Manageable"by"DataONE" Resource" •  Has"unique"idenNfier" Map" System" •  Content"does"not"change" Metadata" OAIJORE"RDF"30 8"
    41. 41. Data packages in DataONE Package Content Associations Using OAI-ORE Add your semantic annotations here31 http://mule1.dataone.org/ArchitectureDocs-current/design/DataPackage.html
    42. 42. Summary: Putting it all together • Research objects fit well with DataONE packages • Workflow and provenance fit well with Research Objects • PROV for provenance • PROV-W for workflow and provenance • Semantic annotations fit well with PROV-W32
    43. 43. References [MLB+10] Missier, Paolo, Bertram Ludascher, Shawn Bowers, Manish Kumar Anand, Ilkay Altintas, Saumen Dey, Anandarup Sarkar, Biva Shrestha, and Carole Goble. “Linking Multiple Workflow Provenance Traces for Interoperable Collaborative Science.” In Proc.s 5th Workshop on Workflows in Support of Large-Scale Science (WORKS), 2010. [MLB+12] Missier, Paolo, Bertram Ludascher, Shawn Bowers, Ilkay Altintas, Saumen Dey, and Michael Agun. “Golden Trail: Retrieving the Data History That Matters from a Comprehensive Provenance Repository.” International Journal of Digital Curation 7, no. 1 (2012). http://www.dcc.ac.uk/events/idcc11. [MSZ+10] Missier, Paolo, Satya S Sahoo, Jun Zhao, Amit Sheth, and Carole Goble. “Janus: From Workflows to Semantic Provenance and Linked Open Data.” In Procs. IPAW 2010. Troy, NY, 2010. http://www.springerlink.com/content/am3551t4q4614r47/. [ZSM+11] Zhao, Jun, Satya S Sahoo, Paolo Missier, Amit Sheth, and Carole Goble. “Extending Semantic Provenance into the Web of Data.” IEEE Internet Computing 15, no. 1 (2011): 40–48. http://doi.ieeecomputersociety.org/10.1109/MIC.2011.7. [MWT+10] Functional Units: Abstractions for Web Service Annotations. Missier, P.; Wolstencroft, K.; Tanoh, F.; Li, P.; Bechhofer, S.; Belhajjame, K.; and Goble, C. 2010. In Procs. IEEE 2010 Fourth International Workshop on Scientific Workflows (SWF 2010), Miami, FL.33

    ×