Janus
             Provenance

                                                    Janus:
                     from Workflo...
Key ideas
• Janus:
       – a semantic provenance model with domain-specific
         extensions
       – designed around ...
Example workflow (Taverna)


 chr: 17                                       QTL →
 start: 28500000
 end: 3000000          ...
Baseline provenance of a workflow run
                              QTL →                                mmu:12575
       ...
Reference user questions

                           QTL →
                        Ensembl Genes                       Q0:...
Query types and model capabilities

                       Query formulation effort       Annotation requirements        Q...
Query types and model capabilities

                       Query formulation effort       Annotation requirements        Q...
Janus: baseline model of provenance
• The semantic provenance model is an OWL ontology
       – defined for domain-agnosti...
Janus structure




                                                      8
Janus -- IPAW, Troy, NY, June 15-17, 2010
Janus structure
                                                                                      port
               ...
Example Janus domain-agnostic fragment

<rdf:Description rdf:about="http://purl.org/net/taverna/janus/remove_Nulls">
<janu...
Annotated provenance

          Annotated workflow                                                 Annotated provenance gr...
Desiderata: from structure to values annotations




                                                                     ...
Desiderata: from structure to values annotations




                                                                     ...
Desiderata: from structure to values annotations

                                                                        ...
Desiderata: from structure to values annotations

                                                                        ...
Annotations propagation rules
                        X rdf:type Port    C = {c}    X has value type c
                   ...
Annotations propagation rules
                        X rdf:type Port    C = {c}    X has value type c
                   ...
Annotations propagation rules
                        X rdf:type Port    C = {c}    X has value type c
                   ...
Annotations as semantic overlay

                                                v1                                       ...
Example Janus domain-aware fragment


       <rdf:Description rdf:about="http://purl.org/net/taverna/janus/test1625">
    ...
Extensions to Linked Data

          Annotated workflow                                            Annotated provenance gr...
Extensions to Linked Data

          Annotated workflow                                            Annotated provenance gr...
I - Mapping data values to LoD URIs
   In our prototype we map data values to Bio2RDF as follows:
                        ...
II - extended queries in the LoD setting
    Strategy:
    - use the SQUIN LoD query engine to query multiple “Web of Data...
Current status
        Current Taverna provenance architecture:
          Lab prototype                                   ...
Summary, and moving forward
• Janus: a semantic model for workflow provenance
       – OWL ontology, extension of Provenir...
Upcoming SlideShare
Loading in...5
×

Paper talk @ Ipaw 2010: Janus: from Workflows to Semantic Provenance and Linked Open Data

821

Published on

citation:

Missier, P., Sahoo, S. S., Zhao, J., Sheth, A., & Goble, C. (2010). Janus: from Workflows to Semantic Provenance and Linked Open Data. Procs. IPAW 2010. Troy, NY.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
821
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Paper talk @ Ipaw 2010: Janus: from Workflows to Semantic Provenance and Linked Open Data

  1. 1. Janus Provenance Janus: from Workflows to Semantic Provenance and Linked Open Data Paolo Missier Jun Zhao Satya S. Sahoo Carole Goble Amit Sheth University of Manchester, UK University of Oxford, UK Wright State University, USA IPAW’ 10 Troy, NY June 15-16, 2010
  2. 2. Key ideas • Janus: – a semantic provenance model with domain-specific extensions – designed around the Taverna workflow model • From domain-agnostic provenance graphs • To domain-aware graphs through explicit annotations • From local provenance graphs and queries scoped to the graph • To – Graphs published as Linked Data – Queries extended into the Web of Data 2 Janus -- IPAW, Troy, NY, June 15-17, 2010
  3. 3. Example workflow (Taverna) chr: 17 QTL → start: 28500000 end: 3000000 Ensembl Genes Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene Uniprot Gene → Entrez Gene → Kegg Gene Kegg Gene merge gene IDs Gene → Pathway path:mmu04210 Apoptosis, path:mmu04010 MAPK, ... Janus -- IPAW, Troy, NY, June 15-17, 2010
  4. 4. Baseline provenance of a workflow run QTL → mmu:12575 Ensembl Genes v1 ... vn w Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene path:mmu04012 exec Uniprot Gene → Entrez Gene → a1 ... an b1 ... bm Kegg Gene Kegg Gene mmu:26416 merge gene IDs path:mmu04010 y11 ymn Gene → Pathway ... path:mmu04010→derives_from→mmu:26416 path:mmu04012→derives_from→mmu:12575 • The graph encodes all direct data dependency relations • Baseline query model: compute paths amongst sets of nodes • Transitive closure over data dependency relations 4 Janus -- IPAW, Troy, NY, June 15-17, 2010
  5. 5. Reference user questions QTL → Ensembl Genes Q0: Find all intermediate and initial input values that contribute to the computation of a certain output value. Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene Q1. Find all those genes within the input QTL Uniprot Gene → Entrez Gene → region that are involved in a given KEGG Kegg Gene Kegg Gene pathway. merge gene IDs Q2: Find all Uniprot-sourced genes Gene → Pathway Q3: Find all Entrez genes that encode proteins involved in ATP binding (go: 0005524). Q4: List relevant PubMed publications for the pathways listed in the result set. 5 Janus -- IPAW, Troy, NY, June 15-17, 2010
  6. 6. Query types and model capabilities Query formulation effort Annotation requirements Query Scope Q0 - Requires knowledge of process structure and data values No annotations required Single run graph or Multi-run graphs - Graphical query constructor may be available Q1 Q2 Requires domain annotations Use of domain terms Single run graph or on workflow tasks and on data facilitates query formulation Multi-run graphs values Q3 Q4 - Use of domain terms - Requires domain annotations facilitates query on workflow tasks and on formulation. data values The Web of Data - Can be integrated with - Relies on completeness of browsers for LoD sources Linked Data Sources 6 Janus -- IPAW, Troy, NY, June 15-17, 2010
  7. 7. Query types and model capabilities Query formulation effort Annotation requirements Query Scope Q0 - Requires knowledge of process structure and data values No annotations required Single run graph or Multi-run graphs - Graphical query constructor may be available Q1 Q2 Requires domain annotations Use of domain terms Single run graph or on workflow tasks and on data facilitates query formulation Multi-run graphs values Q3 Q4 - Use of domain terms - Requires domain annotations facilitates query on workflow tasks and on formulation. data values The Web of Data - Can be integrated with - Relies on completeness of browsers for LoD sources Linked Data Sources 6 Janus -- IPAW, Troy, NY, June 15-17, 2010
  8. 8. Janus: baseline model of provenance • The semantic provenance model is an OWL ontology – defined for domain-agnostic provenance graphs – naturally extensible to domain concepts • extends the Provenir upper ontology [*] – Itself an extension of the Basic Formal Ontology (BFO) • abstract concepts include data, process, and agent – Provenir adds 11 types of relationships: • partonomy relations • temporal information • precedence • causal relationships • ... [*] S. Sahoo and A. Sheth. Provenir ontology: Towards a Framework for eScience Provenance Management, Knoesis Center Tech Report, 2009. 7 Janus -- IPAW, Troy, NY, June 15-17, 2010
  9. 9. Janus structure 8 Janus -- IPAW, Troy, NY, June 15-17, 2010
  10. 10. Janus structure port v1 v2 v3 value X1 X2 X3 X1 X2 X3 port P exec P_inst Y1 Y2 Y1 Y2 processor processor exec spec w1 w2 8 Janus -- IPAW, Troy, NY, June 15-17, 2010
  11. 11. Example Janus domain-agnostic fragment <rdf:Description rdf:about="http://purl.org/net/taverna/janus/remove_Nulls"> <janus:has_execution rdf:resource="http://purl.org/net/taverna/janus/remove_Nulls"/> <knoesis:has_parameter rdf:resource="http://purl.org/net/taverna/janus/remove_Nulls/output"/> <knoesis:has_parameter rdf:resource="http://purl.org/net/taverna/janus/remove_Nulls/input"/> <obo:part_of rdf:resource="http://purl.org/net/taverna/janus/e589d90b-01f2-4de6-..."/> <rdf:type rdf:resource="http://purl.org/net/taverna/janus#processor_spec"/> <rdf:Description rdf:about="http://purl.org/net/taverna/janus/remove_Nulls/input"> <janus:has_value_binding rdf:resource="http://purl.org/net/taverna/janus/test1625"/> <janus:links_from rdf:resource="http://purl.org/net/taverna/janus/merge_entrez_genes/ concatenated"/> <janus:is_processor_input rdf:datatype="http://www.w3.org/2001/ XMLSchema#boolean">true</janus:is_processor_input> <rdf:type rdf:resource="http://purl.org/net/taverna/janus#port"/> <rdf:Description rdf:about="http://purl.org/net/taverna/janus/test1625"> <janus:has_iteration rdf:datatype="http://www.w3.org/2001/XMLSchema#string">[]</ janus:has_iteration> <rdf:type rdf:resource="http://purl.org/net/taverna/janus#port_value"/> </rdf:Description> 9 Janus -- IPAW, Troy, NY, June 15-17, 2010
  12. 12. Annotated provenance Annotated workflow Annotated provenance graph QTL → Gene Ensembl Genes Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene exec ... Uniprot Gene → Entrez Gene → Kegg Gene Kegg Gene merge gene IDs Kegg Gene → Pathway Q1 Q2 10 Janus -- IPAW, Troy, NY, June 15-17, 2010
  13. 13. Desiderata: from structure to values annotations 11 Janus -- IPAW, Troy, NY, June 15-17, 2010
  14. 14. Desiderata: from structure to values annotations 11 Janus -- IPAW, Troy, NY, June 15-17, 2010
  15. 15. Desiderata: from structure to values annotations protein sequence X1 X2 X3 interpro P match report Y1 Y2 processor interpro spec scan 11 Janus -- IPAW, Troy, NY, June 15-17, 2010
  16. 16. Desiderata: from structure to values annotations protein sequence X1 X2 X3 interpro P match report Y1 Y2 processor interpro spec scan protein sequence port v1 v2 v3 value X1 exec X2 X3 port P_inst interpro Y1 Y2 processor interpro scan exec w1 w2 match report 11 Janus -- IPAW, Troy, NY, June 15-17, 2010
  17. 17. Annotations propagation rules X rdf:type Port C = {c} X has value type c X has value v v rdf:type PortValue v rdf:type C denotes data type in the PL protein sense has_value_type sequence X1 X2 X3 interpro P match report Y1 Y2 processor interpro spec scan 12 Janus -- IPAW, Troy, NY, June 15-17, 2010
  18. 18. Annotations propagation rules X rdf:type Port C = {c} X has value type c X has value v v rdf:type PortValue v rdf:type C denotes data type in the PL protein sense has_value_type sequence X1 X2 X3 interpro P match report Y1 Y2 processor ? interpro spec port scan v1 v2 v3 value X1 X2 X3 port P_inst interpro Y1 Y2 processor interpro scan exec w1 w2 match report 12 Janus -- IPAW, Troy, NY, June 15-17, 2010
  19. 19. Annotations propagation rules X rdf:type Port C = {c} X has value type c X has value v v rdf:type PortValue v rdf:type C denotes data type in the PL protein sense has_value_type sequence X1 X2 X3 interpro P match report Y1 Y2 protein processor sequence interpro spec port scan v1 v2 v3 value X1 X2 X3 port P_inst interpro Y1 Y2 processor interpro scan exec w1 w2 match report 12 Janus -- IPAW, Troy, NY, June 15-17, 2010
  20. 20. Annotations as semantic overlay v1 w1 X3 Y2 Provenance graph has_port_value has_port_value X2 P fragment Y1 X1 vn wm has-input-type Pathway has-output-type search service instance-of instance-of instance-of Gene v1 w1 Pathway has-source X3 has-source has_port_value Y2 has_port_value X2 P instance-of instance-of Y1 Kegg wm X1 vn Kegg has-source has-source 13 Janus -- IPAW, Troy, NY, June 15-17, 2010
  21. 21. Example Janus domain-aware fragment <rdf:Description rdf:about="http://purl.org/net/taverna/janus/test1625"> <janus:has_iteration>[]</janus:has_iteration> <rdf:type rdf:resource="http://purl.org/net/taverna/janus#port_value"/> <rdf:type rdf:resource="http://purl.org/obo/owl/sequence#gene"/> <janus:has_source rdf:resource="http://purl.org/net/taverna/janus#KEGG"/> </rdf:Description> this is rule- defined, too 14 Janus -- IPAW, Troy, NY, June 15-17, 2010
  22. 22. Extensions to Linked Data Annotated workflow Annotated provenance graph QTL → Ensembl Genes Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene exec ... Uniprot Gene → Entrez Gene → Kegg Gene Kegg Gene merge gene IDs Gene → Pathway 15 Janus -- IPAW, Troy, NY, June 15-17, 2010
  23. 23. Extensions to Linked Data Annotated workflow Annotated provenance graph QTL → Ensembl Genes Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene exec ... Uniprot Gene → Entrez Gene → Kegg Gene Kegg Gene merge gene IDs Gene → Pathway - Publish - I - Map IDs - II - query 15 Janus -- IPAW, Troy, NY, June 15-17, 2010
  24. 24. I - Mapping data values to LoD URIs In our prototype we map data values to Bio2RDF as follows: Entrez Genes Uniprot Genes KEGG Genes KEGG Pathways <rdf:Description rdf:about="http://purl.org/net/taverna/janus/create_report/entrezGeneId"> <janus:has_value_binding rdf:resource="http://purl.org/net/taverna/janus/test18"/> <rdf:Description rdf:about="http://purl.org/net/taverna/janus/test18"> <rdf:type rdf:resource="http://purl.org/net/taverna/janus#port_value"/> <rdfs:comment>11835</rdfs:comment> <rdf:type rdf:resource="http://purl.org/obo/owl/sequence#gene"/> <janus:has_source rdf:resource="http://purl.org/net/taverna/janus#entrez_gene"/> <rdf:Description rdf:about="http://purl.org/net/taverna/janus/test18"> <rdfs:seeAlso rdf:resource="http://bio2rdf.org/geneid:11835"/> 16 Janus -- IPAW, Troy, NY, June 15-17, 2010
  25. 25. II - extended queries in the LoD setting Strategy: - use the SQUIN LoD query engine to query multiple “Web of Data” sources - only Bio2RDF in our case - combine graph patterns on local provenance with conditions on remote LoD graphs Q5: Find all Entrez genes that encode proteins involved in ATP binding (GO:0005524). PREFIX uniprot: <http://purl.uniprot.org/core/> PREFIX : <http://www.taverna.org.uk/janus#> SELECT distinct ?entrezgene WHERE { ?protein uniprot:classifiedWith <http://bio2rdf.org/go:0005524> . Bio2RDF ?entrezgene <http://bio2rdf.org/bio2rdf_resource:xPath> ?protein . ?gene rdfs:seeAlso ?entrezgene ?gene rdf:type :port_gene ?gene :has_source :entrez_gene . } local provenance graph 17 Janus -- IPAW, Troy, NY, June 15-17, 2010
  26. 26. Current status Current Taverna provenance architecture: Lab prototype Production – “Export as...” Janus RDF – “native” (relational) graphs – currently only queried using – simple, efficient query language on SPARQL native provenance e – manually published – manually annotated Taverna Taverna <<Events stream>> Provenance runtime events capture query <scope ... /> Lineage <select .../> relational query <focus .../> processor DB query response OPM graph complete Janus RDF graph exporter Provenance API (Java) 18 Janus -- IPAW, Troy, NY, June 15-17, 2010
  27. 27. Summary, and moving forward • Janus: a semantic model for workflow provenance – OWL ontology, extension of Provenir – should include attribution + system level provenance – alignment with OPM? • Domain-aware graphs through annotations: – automatically propagated from workflow annotations when possible – but in practice no real workflows are annotated • LoD integration: – powerful provenance publishing and query broadening – mapping rules currently limited – no completeness guarantee -- all joins are outer joins! 19 Janus -- IPAW, Troy, NY, June 15-17, 2010
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×