Nesc invited presentation: Semantic Provenance and Linked Open Data

946 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
946
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • On a typical QTL region with ~150k base pairs, one execution of this workflow finds about 50 Ensembl genes. \nThese could correspond to about 30 genes in the UniProt database, and 60 in the NCBI Etrez genes database. \nEach gene may be involved in a number of pathways, for example the mouse genes Mapk13 (mmu:26415) and \nCdkn1a (mmu:12575) participate to 13 and 9 pathways, respectively. \n\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • comments\nhard for users to formulate a suitable provenance query on scenario I\ncan be done visually but still good understanding of the process and its relation to data is needed\n
  • comments\nhard for users to formulate a suitable provenance query on scenario I\ncan be done visually but still good understanding of the process and its relation to data is needed\n
  • comments\nhard for users to formulate a suitable provenance query on scenario I\ncan be done visually but still good understanding of the process and its relation to data is needed\n
  • comments\nhard for users to formulate a suitable provenance query on scenario I\ncan be done visually but still good understanding of the process and its relation to data is needed\n
  • \n
  • annotations done manually for this implementation but could have been inherited from service annotations -- we show this in the next slide\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Nesc invited presentation: Semantic Provenance and Linked Open Data

    1. 1. “looking back to gaze forwards” Janus Provenance LOP: Linked Open Provenance (was: from Workfl ows to Semantic Provenance and Linked Open Data) Paolo Missier Jun Zhao Satya S. Sahoo Newcastle University Amit Sheth(University of Manchester) University of Oxford, UK Wright State University, USA Nesc workshop: Understanding Provenance and Linked Open Data Edinburgh, March 30th, 2011
    2. 2. Baseline provenance of a workflow run QTL → mmu:12575 Ensembl Genes v1 ... vn w Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene path:mmu04012 exec Uniprot Gene → Entrez Gene → a1 ... an b1 ... bm Kegg Gene Kegg Gene mmu:26416 merge gene IDs path:mmu04010 y11 ymn Gene → Pathway ... path:mmu04010→derives_from→mmu:26416 path:mmu04012→derives_from→mmu:12575 • The graph encodes all direct data dependency relations • Baseline query model: compute paths amongst sets of nodes • Transitive closure over data dependency relations2 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
    3. 3. Key ideas • Janus: – a semantic provenance model with domain-specific extensions – designed around the Taverna workflow model • From domain-agnostic provenance graphs • To domain-aware graphs through explicit annotations • From local provenance graphs and queries scoped to the graph • To – Graphs published as Linked Data – Queries extended into the Web of Data3 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
    4. 4. Reference user questions QTL → Ensembl Genes Q0: Find all intermediate and initial input values that contribute to the computation of a certain output value. Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene Q1. Find all those genes within the input QTL Uniprot Gene → Entrez Gene → region that are involved in a given KEGG Kegg Gene Kegg Gene pathway. merge gene IDs Q2: Find all Uniprot-sourced genes Gene → Pathway Q3: Find all Entrez genes that encode proteins involved in ATP binding (go: 0005524). Q4: List relevant PubMed publications for the pathways listed in the result set.4 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
    5. 5. Query types and model capabilities Query formulation effort Annotation requirements Query Scope Q0 - Requires knowledge of process structure and data values No annotations required Single run graph or Multi-run graphs - Graphical query constructor may be available Q1 Q2 Requires domain annotations Use of domain terms Single run graph or on workflow tasks and on data facilitates query formulation Multi-run graphs values Q3 Q4 - Use of domain terms - Requires domain annotations facilitates query on workflow tasks and on formulation. data values The Web of Data - Can be integrated with - Relies on completeness of browsers for LoD sources Linked Data Sources5 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
    6. 6. Query types and model capabilities Query formulation effort Annotation requirements Query Scope Q0 - Requires knowledge of process structure and data values No annotations required Single run graph or Multi-run graphs - Graphical query constructor may be available Q1 Q2 Requires domain annotations Use of domain terms Single run graph or on workflow tasks and on data facilitates query formulation Multi-run graphs values Q3 Q4 - Use of domain terms - Requires domain annotations facilitates query on workflow tasks and on formulation. data values The Web of Data - Can be integrated with - Relies on completeness of browsers for LoD sources Linked Data Sources5 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
    7. 7. Janus structure6 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
    8. 8. Janus structure port v1 v2 v3 value X1 X2 X3 X1 X2 X3 port P exec P_inst Y1 Y2 Y1 Y2 processor processor exec spec w1 w26 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
    9. 9. Annotated provenance Annotated workflow Annotated provenance graph QTL → Gene Ensembl Genes Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene exec ... Uniprot Gene → Entrez Gene → Kegg Gene Kegg Gene merge gene IDs Kegg Gene → Pathway Q1 Q27 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
    10. 10. Desiderata: from structure to values annotations8 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
    11. 11. Desiderata: from structure to values annotations8 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
    12. 12. Desiderata: from structure to values annotations protein sequence X1 X2 X3 interpro P match report Y1 Y2 processor interpro spec scan8 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
    13. 13. Desiderata: from structure to values annotations protein sequence X1 X2 X3 interpro P match report Y1 Y2 processor interpro spec scan protein sequence port v1 v2 v3 value X1 exec X2 X3 port P_inst interpro Y1 Y2 processor interpro scan exec w1 w2 match8 report P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
    14. 14. Annotations propagation rules X rdf:type Port C = {c} X has value type c X has value v v rdf:type PortValue v rdf:type C denotes data type in the PL protein sense has_value_type sequence X1 X2 X3 interpro P match report Y1 Y2 processor interpro spec scan9 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
    15. 15. Annotations propagation rules X rdf:type Port C = {c} X has value type c X has value v v rdf:type PortValue v rdf:type C denotes data type in the PL protein sense has_value_type sequence X1 X2 X3 interpro P match report Y1 Y2 processor ? interpro spec port scan v1 v2 v3 value X1 X2 X3 port P_inst interpro Y1 Y2 processor interpro scan exec w1 w2 match9 report P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
    16. 16. Annotations propagation rules X rdf:type Port C = {c} X has value type c X has value v v rdf:type PortValue v rdf:type C denotes data type in the PL protein sense has_value_type sequence X1 X2 X3 interpro P match report Y1 Y2 protein processor sequence interpro spec port scan v1 v2 v3 value X1 X2 X3 port P_inst interpro Y1 Y2 processor interpro scan exec w1 w2 match9 report P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
    17. 17. Annotations as semantic overlay v1 w1 X3 Y2 Provenance graph has_port_value has_port_value X2 P fragment Y1 X1 vn wm has-input-type Pathway has-output-type search service instance-of instance-of instance-of Gene v1 w1 Pathway has-source X3 has-source has_port_value Y2 has_port_value X2 P instance-of instance-of Y1 Kegg wm X1 vn Kegg has-source has-source10 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
    18. 18. Extensions to Linked Data Annotated workflow Annotated provenance graph QTL → Ensembl Genes Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene exec ... Uniprot Gene → Entrez Gene → Kegg Gene Kegg Gene merge gene IDs Gene → Pathway11 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
    19. 19. Extensions to Linked Data Annotated workflow Annotated provenance graph QTL → Ensembl Genes Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene exec ... Uniprot Gene → Entrez Gene → Kegg Gene Kegg Gene merge gene IDs Gene → Pathway - Publish - I - Map IDs - II - query11 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
    20. 20. I - Mapping data values to LoD URIs In our prototype we map data values to Bio2RDF as follows: Entrez Genes Uniprot Genes KEGG Genes KEGG Pathways <rdf:Description rdf:about="http://purl.org/net/taverna/janus/create_report/entrezGeneId"> <janus:has_value_binding rdf:resource="http://purl.org/net/taverna/janus/test18"/> <rdf:Description rdf:about="http://purl.org/net/taverna/janus/test18"> <rdf:type rdf:resource="http://purl.org/net/taverna/janus#port_value"/> <rdfs:comment>11835</rdfs:comment> <rdf:type rdf:resource="http://purl.org/obo/owl/sequence#gene"/> <janus:has_source rdf:resource="http://purl.org/net/taverna/janus#entrez_gene"/> <rdf:Description rdf:about="http://purl.org/net/taverna/janus/test18"> <rdfs:seeAlso rdf:resource="http://bio2rdf.org/geneid:11835"/>12 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
    21. 21. II - extended queries in the LoD setting Strategy: - use the SQUIN LoD query engine to query multiple “Web of Data” sources - only Bio2RDF in our case - combine graph patterns on local provenance with conditions on remote LoD graphs Q5: Find all Entrez genes that encode proteins involved in ATP binding (GO:0005524). PREFIX uniprot: <http://purl.uniprot.org/core/> PREFIX : <http://www.taverna.org.uk/janus#> SELECT distinct ?entrezgene WHERE { ?protein uniprot:classifiedWith <http://bio2rdf.org/go:0005524> . Bio2RDF ?entrezgene <http://bio2rdf.org/bio2rdf_resource:xPath> ?protein . ?gene rdfs:seeAlso ?entrezgene ?gene rdf:type :port_gene ?gene :has_source :entrez_gene . } local provenance graph13 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
    22. 22. Current status Current Taverna provenance architecture: Lab prototype Production – “Export as...” Janus RDF – “native” (relational) graphs – currently only queried using – simple, efficient query language on SPARQL native provenancee – manually published – manually annotated Taverna Taverna <<Events stream>> Provenance runtime events capture query <scope ... /> Lineage <select .../> relational query <focus .../> processor DB query response OPM graph complete Janus RDF graph exporter Provenance API (Java) 14 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011
    23. 23. Summary and References • Janus: a semantic model for workflow provenance – OWL ontology, extension of Provenir – should include attribution + system level provenance – alignment with OPM? • Domain-aware graphs through annotations: – automatically propagated from workflow annotations when possible – but in practice no real workflows are annotated • LoD integration: – powerful provenance publishing and query broadening – mapping rules currently limited – no completeness guarantee -- all joins are outer joins! P. Missier, S.S. Sahoo, J. Zhao, A. Sheth, and C. Goble, "Janus: from Workflows to Semantic Provenance and Linked Open Data," Procs. IPAW 2010, Troy, NY: 2010 S.S. Sahoo, J. Zhao, and P. Missier, "Extending Semantic provenance into the Web of Data," Internet Computing, special issue on Provenance in Web Applications, vol. to appear, 2011.15 P.Missier - Provenance and Linked Data workshop Edinburgh March 2011

    ×