SADI CSHALS 2013

1,000 views
901 views

Published on

the introduction to the SADI tutorial at CSHALS 2013, Boston, Feb 27, 2013

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,000
On SlideShare
0
From Embeds
0
Number of Embeds
39
Actions
Shares
0
Downloads
27
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

SADI CSHALS 2013

  1. 1. Semantic Automated Discovery and Integration SADI Services Tutorial Mark Wilkinson Isaac Peral Senior Researcher in Biological Informatics Centro de Biotecnología y Genómica de Plantas, UPM, Madrid, Spain Adjunct Professor of Medical Genetics, University of British Columbia Vancouver, BC, Canada.
  2. 2. Part IMOTIVATION
  3. 3. A lot of important information cannot be represented on the Semantic Web For example, all of the data that results from analytical algorithms and statistical analyses (I’m purposely excluding databases from the list of examples for reasons I will discuss in a moment)
  4. 4. Varying estimatesput the size of theDeep Web between500 and 800 timeslarger than thesurface Web
  5. 5. On the WWW“automation” ofaccess to Deep Webdata happens through“Web Services”
  6. 6. Traditional definitions of The Deep Web include databases that have Web FORM interfaces. HOWEVER The Life Science Semantic Web communityis encouraging the establishment of SPARQL endpoints as the way to serve that same data to the world (i.e. NOT through Web Services)
  7. 7. I am quite puzzled by this...
  8. 8. Historically, most* bio/informatics databases do not allow direct public SQL access *yes, I know there are some exceptions!
  9. 9. “We need to commit specific hardware forthat [mySQL] service. We don’t use thesame servers for mySQL as for theWebsite...”“...we resolve the situation by asking theuser to stop hammering the server. Thismight involve temporary ban on the IP...” - ENSEMBL Helpdesk
  10. 10. So... There appears to be good reasonswhy most data providers do not expose their databases for public query!
  11. 11. Are SPARQL endpoints somehow “safer” or “better”?
  12. 12. One of the early-adopters of RDF/SPARQLin the bioinformatics domain was UniProt
  13. 13. How are things going for them?
  14. 14. Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yesTo: Mark <markw@illuminae.com>Date: Tue, 19 Feb 2013 13:11:22 +0100 A message posted to the Bio2RDFSubject: SPARQL or not?MIME-Version: 1.0Content-Transfer-Encoding: 7bit mailing list last week from JervenFrom: "Mark Wilkinson" <markw@illuminae.com>Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark>User-Agent: Opera Mail/12.14 (Linux) Bolleman, one of the team-membersX-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614]X-AVG-ID: ID798D8A94-2992BC71 behind UniProt’s push for RDF...Hi Bio2RDF maintainers,I keep on noticing this rather expensive query.CONSTRUCT { <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> . <http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s . <http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s . ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label . ?s <http://purl.org/dc/elements/1.1/title> ?title . ?s <http://purl.org/dc/terms/title> ?dctermstitle . ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type . ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel . ?s ?p ?o .}WHERE { ?s ?p ?o FILTER contains(str(?o), ""Paget"") OPTIONAL { ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label } OPTIONAL { ?s <http://purl.org/dc/elements/1.1/title> ?title } OPTIONAL { ?s <http://purl.org/dc/terms/title> ?dctermstitle } OPTIONAL { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type } OPTIONAL { ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel } }OFFSET 0LIMIT 500It comes from the example queries on the bio2rdf landing page.Its extremely resource consuming and totally useless as it will never ever run in time.Can you please change this query to something useful and workable. And at least cache the results if you ever get them.Regards,Jerven
  15. 15. Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yesTo: Mark <markw@illuminae.com>Date: Tue, 19 Feb 2013 13:11:22 +0100Subject: SPARQL or not?MIME-Version: 1.0Content-Transfer-Encoding: 7bitFrom: "Mark Wilkinson" <markw@illuminae.com>Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark>User-Agent: Opera Mail/12.14 (Linux)X-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614]X-AVG-ID: ID798D8A94-2992BC71 I keep noticing thisHi Bio2RDF maintainers,I keep on noticing this rather expensive query. rather expensive queryCONSTRUCT { <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> . <http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s . <http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s . ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label . ?s <http://purl.org/dc/elements/1.1/title> ?title . ?s <http://purl.org/dc/terms/title> ?dctermstitle . ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type . ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel . ?s ?p ?o .}WHERE { ?s ?p ?o FILTER contains(str(?o), ""Paget"") OPTIONAL { ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label } OPTIONAL { ?s <http://purl.org/dc/elements/1.1/title> ?title } OPTIONAL { ?s <http://purl.org/dc/terms/title> ?dctermstitle } OPTIONAL { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type } OPTIONAL { ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel } }OFFSET 0LIMIT 500It comes from the example queries on the bio2rdf landing page.Its extremely resource consuming and totally useless as it will never ever run in time.Can you please change this query to something useful and workable. And at least cache the results if you ever get them.Regards,Jerven
  16. 16. Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yesTo: Mark <markw@illuminae.com>Date: Tue, 19 Feb 2013 13:11:22 +0100Subject: SPARQL or not?MIME-Version: 1.0Content-Transfer-Encoding: 7bitFrom: "Mark Wilkinson" <markw@illuminae.com>Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark>User-Agent: Opera Mail/12.14 (Linux)X-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614]X-AVG-ID: ID798D8A94-2992BC71Hi Bio2RDF maintainers,I keep on noticing this rather expensive query.CONSTRUCT { <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> . <http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s . <http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s . ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label . ?s <http://purl.org/dc/elements/1.1/title> ?title . ?s <http://purl.org/dc/terms/title> ?dctermstitle . ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type . ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel . ?s ?p ?o .}WHERE { ?s ?p ?o FILTER contains(str(?o), ""Paget"") OPTIONAL It comes from THE EXAMPLE QUERIES { ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label } OPTIONAL on the Bio2RDF landing page { ?s <http://purl.org/dc/elements/1.1/title> ?title } OPTIONAL { ?s <http://purl.org/dc/terms/title> ?dctermstitle } OPTIONAL { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type } (my emphasis added) OPTIONAL { ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel } }OFFSET 0LIMIT 500It comes from the example queries on the bio2rdf landing page.Its extremely resource consuming and totally useless as it will never ever run in time.Can you please change this query to something useful and workable. And at least cache the results if you ever get them.Regards,Jerven
  17. 17. Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yesTo: Mark <markw@illuminae.com>Date: Tue, 19 Feb 2013 13:11:22 +0100Subject: SPARQL or not?MIME-Version: 1.0Content-Transfer-Encoding: 7bitFrom: "Mark Wilkinson" <markw@illuminae.com>Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark>User-Agent: Opera Mail/12.14 (Linux)X-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614]X-AVG-ID: ID798D8A94-2992BC71Hi Bio2RDF maintainers,I keep on noticing this rather expensive query.CONSTRUCT { <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> . <http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s . <http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s . ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label . ?s <http://purl.org/dc/elements/1.1/title> ?title . ?s <http://purl.org/dc/terms/title> ?dctermstitle . ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type . ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel . ?s ?p ?o .}WHERE { ?s ?p ?o FILTER contains(str(?o), ""Paget"") OPTIONAL { ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label } OPTIONAL { ?s <http://purl.org/dc/elements/1.1/title> ?title } OPTIONAL It’s extremely resource- { ?s <http://purl.org/dc/terms/title> ?dctermstitle } OPTIONAL { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type } consuming and totally useless as } OPTIONAL { ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel } it will never run in timeOFFSET 0LIMIT 500It comes from the example queries on the bio2rdf landing page.Its extremely resource consuming and totally useless as it will never ever run in time.Can you please change this query to something useful and workable. And at least cache the results if you ever get them.Regards,Jerven
  18. 18. So even people who are world-leaders in RDF and SPARQL write “expensive” and “useless” queries that (already!) are making life difficult for SPARQL endpoint providers I believe that situation will only get worse as more people begin to use the Semantic Web and as SPARQL itself becomes richer and more SQL-like
  19. 19. In My Opinion History tells us, and this story IMO supports,that SPARQL endpoints might not be widely adopted by source bioinformatics data providersHistorically, the majority of bioinformatics data hosts have opted for API/Service-based access to their resources
  20. 20. In My OpinionMoreover, I am still obsessed with interoperability! Having a unified way to discover, and access, bioinformatics resources whether they be databases or algorithms just seems like a Good Thing™
  21. 21. In My OpinionSo we need to find a way to make Web Services play nicely with the Semantic Web
  22. 22. Design Pattern forWeb Services on the Semantic Web
  23. 23. Part IISADI “PHILOSOPHY” AND DESIGN
  24. 24. The Semantic Web causally related to
  25. 25. The important bitThe link is explicitly labeled causally related to ???
  26. 26. causally related with http://semanticscience.org/resource/SIO_000243SIO_000243:<owl:ObjectProperty rdf:about="&resource;SIO_000243"> <rdfs:label xml: lang="en"> is causally related with</rdfs:label> <rdf:type rdf:resource="&owl;SymmetricProperty"/> <rdf:type rdf:resource="&owl;TransitiveProperty"/> <dc:description xml:lang="en"> A transitive, symmetric, temporal relationin which one entity is causally related with another non-identical entity. </dc:description> <rdfs:subPropertyOf rdf:resource="&resource;SIO_000322"/> </owl:ObjectProperty>
  27. 27. causally related with http://semanticscience.org/resource/SIO_000243SIO_000243:<owl:ObjectProperty rdf:about="&resource;SIO_000243"> <rdfs:label xml: lang="en"> is causally related with</rdfs:label> <rdf:type rdf:resource="&owl;SymmetricProperty"/> <rdf:type rdf:resource="&owl;TransitiveProperty"/> <dc:description xml:lang="en"> A transitive, symmetric, temporal relationin which one entity is causally related with another non-identical entity. </dc:description> <rdfs:subPropertyOf rdf:resource="&resource;SIO_000322"/> </owl:ObjectProperty>
  28. 28. There are many suggestions for how to bring the Deep Webinto the Semantic Web using Semantic Web Services (SWS) OWL-S SAWSDL WSDL-S Others...
  29. 29. There are many suggestions for how to bring the Deep Webinto the Semantic Web using Semantic Web Services (SWS) Describe input data Describe output data Describe how the system manipulates the data Describe how the world changes as a result
  30. 30. There are many suggestions for how to bring the Deep Webinto the Semantic Web using Semantic Web Services (SWS) Describe input data Usually through “semantic annotation” Describe output data of XML Schema Describe how the system manipulates the data Describe how the world changes as a result
  31. 31. There are many suggestions for how to bring the Deep Webinto the Semantic Web using Semantic Web Services (SWS) Describe input data In the least-semantic case, the input and output data is “vanilla” Describe output data XML Describe how the system manipulates the data Describe how the world changes as a result
  32. 32. There are many suggestions for how to bring the Deep Webinto the Semantic Web using Semantic Web Services (SWS) Describe input data In the “most semantic” case (WSDL) RDF is converted into XML, Describe output data then back to RDF again Describe how the system manipulates the data Describe how the world changes as a result
  33. 33. There are many suggestions for how to bring the Deep Webinto the Semantic Web using Semantic Web Services (SWS) Describe input data The rigidity of XML Schema is the antithesis of the Describe output data Semantic Web! Describe how the system manipulates the data Describe how the world changes as a result
  34. 34. There are many suggestions for how to bring the Deep Webinto the Semantic Web using Semantic Web Services (SWS) Describe input data So... Perhaps we shouldn’t be using XML Describe output data Schema at all...?? Describe how the system manipulates the data Describe how the world changes as a result
  35. 35. There are many suggestions for how to bring the Deep Webinto the Semantic Web using Semantic Web Services (SWS) Describe input data Describe output data Describe how the system manipulates the data HARD! Describe how the world changes as a result
  36. 36. There are many suggestions for how to bring the Deep Webinto the Semantic Web using Semantic Web Services (SWS) Describe input data Describe output data Describe how the system manipulates the data Un-necessary? Describe how the world changes as a result
  37. 37. Lord, Phillip, et al. The Semantic Web–ISWC 2004 (2004): 350-364.
  38. 38. Scientific Web Services are DIFFERENT!Lord, Phillip, et al. The Semantic Web–ISWC 2004 (2004): 350-364.
  39. 39. “The service interfaces within bioinformatics are relatively simple. An extensible or constrained interoperability framework is likely to suffice for current demands: a fully generic framework is currently not necessary.”Lord, Phillip, et al. The Semantic Web–ISWC 2004 (2004): 350-364.
  40. 40. Scientific Web Services are DIFFERENT They’re simpler!Rather than waiting for a solution to the more general problem (which may be years away... or more!) can we solve the Semantic Web Service problem within the scientific domain while still being fully standards-compliant?
  41. 41. Other “philosophical” considerations
  42. 42. v.v. being Semantic Webby, what is missing from this list? Describe input data Describe output dataDescribe how the system manipulates the data Describe how the world changes as a result
  43. 43. causally related withhttp://semanticscience.org/resource/SIO_000243
  44. 44. causally related with http://semanticscience.org/resource/SIO_000243The Semantic Web works because of relationships!
  45. 45. causally related with http://semanticscience.org/resource/SIO_000243 The Semantic Web works because of relationships! In 2008 I proposed that, in the Semantic Web world,algorithms should be viewed as “exposing” relationships between the input and output data
  46. 46. Web ServiceAACTCTTCGTAGTG... BLAST
  47. 47. SADI AACTCTTCGTAGTG... has_seq_string sequenceAACTCTTCGTAGTG... has_seq_string has homology sequence to BLAST Terminal Flower type species SADI requires you to explicitly declare as part of your analytical output, gene A. thal. the biological relationship that your algorithm “exposed”.
  48. 48. Another “philosophical” decision was to abandon XML Schema In a world that is moving towards RDF representations of all datait makes no sense to convert semantically rich RDF into semantic-free Schema-based XML then back into RDF again
  49. 49. The final philosophical decision was to abandon SOAPThe bioinformatics community seems to be very receptive to pure-HTTP interfaces (e.g. the popularity of REST-like APIs) So SADI uses simple HTTP POST of just the RDF input data (no message scaffold whatsoever)
  50. 50. Part IIISADI SERVICE DISCOVERYAND INVOCATION
  51. 51. In slightly more detail...
  52. 52. ID Name Height Weight Age 24601 Jean Valjean 1.8m 84kg 457474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39... ... ... ... ...
  53. 53. ID Name Height Weight Age 24601 Jean Valjean 1.8m 84kg 457474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39... ... ... ... ...
  54. 54. OWL-DL Classes ID Name Height Weight Age 24601 Jean Valjean 1.8m 84kg 45 7474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39 ... ... ... ... ...
  55. 55. Property restrictionsin OWL Class definition ID Name Height Weight Age 24601 Jean Valjean 1.8m 84kg 45 7474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39 ... ... ... ... ...
  56. 56. ID Name Height Weight Age 24601 Jean Valjean 1.8m 84kg 457474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39... ... ... ... ...
  57. 57. A reasoner determines that Patient #24601 is an OWL Individual of the Input service ClassID Name Height Weight Age 24601 Jean Valjean 1.8m 84kg 457474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39... ... ... ... ...
  58. 58. NOTE THE URI OF THE INPUT INDIVIDUAL Patient:24601ID Name Height Weight Age 24601 Jean Valjean 1.8m 84kg 457474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39... ... ... ... ...
  59. 59. ID Name Height Weight Age BMI 24601 Jean Valjean 1.8m 84kg 45 25.97474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39... ... ... ... ...
  60. 60. NOTE THE URI OF THE OUTPUT INDIVIDUAL Patient:24601 ID Name Height Weight Age BMI 24601 Jean Valjean 1.8m 84kg 45 25.9 7474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39 ... ... ... ... ...
  61. 61. The URI of the input is linked by a meaningful predicate to the output(either literal output or another URI)
  62. 62. Therefore, by connecting SADI servicestogether in a workflow you end-up with an unbroken chain of Linked Data
  63. 63. Part IVSADI TO THE EXTREME:“WEB SCIENCE 2.0”
  64. 64. A proof-of-concept query engine & registry Objective: answer biologists’ questions
  65. 65. The SHARE registry indexes all of the input/output/relationshiptriples that can be generated by all known services This is how SHARE discovers services
  66. 66. We wanted to duplicatea real, peer-reviewed, bioinformatics analysis simply by building a model in the Web describing what the answer (if one existed) would look like
  67. 67. ...the machine had to make every other decision on it’s own
  68. 68. This is the study we chose:
  69. 69. Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspeciesdata mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
  70. 70. Original Study SimplifiedUsing what is known about interactions in fly & yeast predict new interactions with your protein of interest
  71. 71. “Pseudo-code” Abstracted WorkflowGiven a protein P in Species X Find proteins similar to P in Species Y Retrieve interactors in Species Y Sequence-compare Y-interactors with Species X genome (1)  Keep only those with homologue in X Find proteins similar to P in Species Z Retrieve interactors in Species Z Sequence-compare Z-interactors with (1)  Putative interactors in Species X
  72. 72. Modeling the science... OWL
  73. 73. Modeling the science... ProbableInteractor is homologous to ( Potential Interactor from ModelOrganism1…) and Potential Interactor from ModelOrganism2…)Probable Interactor is defined in OWL as a subClass - something that appears as a potential interactor in both comparator model organisms.
  74. 74. Running the Web Science Experiment In a local data-file provide the protein we are interested in and the two species we wish to use in our comparison taxon:9606 a i:OrganismOfInterest . # human uniprot:Q9UK53 a i:ProteinOfInterest . # ING1 taxon:4932 a i:ModelOrganism1 . # yeast taxon:7227 a i:ModelOrganism2 . # fly
  75. 75. The tricky bit is... In the abstract, the search for homology is “generic” – ANY Protein, ANY model systemBut when the machine does the experiment, it will need to use (at least) twoorganism-specific resourcesbecause the answer requires information from two taxon:4932 a i:ModelOrganism1 . # yeast declared species taxon:7227 a i:ModelOrganism2 . # fly
  76. 76. This is the question we ask: (the query language here is SPARQL)PREFIX i: <http://sadiframework.org/ontologies/InteractingProteins.owl#>SELECT ?proteinFROM <file:/local/workflow.input.n3>WHERE { ?protein a i:ProbableInteractor .} The URL of our OWL model (ontology) defining Probable Interactors
  77. 77. Each relationship (property-restriction)in the OWL Class is then matchedwith a SADI ServiceThe matched SADI Service cangenerate data that fulfils thatproperty restriction(i.e. produces triples with that S/P/O pattern)
  78. 78. SHARE chains these SADI servicesinto an analytical workflow......the outputs from that workflow areInstances (OWL Individuals) ofProbable Interactors
  79. 79. SHARE derived (and executed) the following workflow automatically These are different SADI Web Services... ...selected at run-time based on the same model
  80. 80. Keys to Success:1: Use standards2: Focus on predicates, not classes3: Use these predicates to define, rather than assert, classes4: Make sure all URIs resolve, and resolve to something useful5: Never leave the RDF world... (abandon vanilla XML, even for Web Services!)6: Use reasoners... Everywhere... Always!
  81. 81. Part VTHE TOOLS AVAILABLE
  82. 82. Part V - ASERVICE PROVISION
  83. 83. Libraries • Perl • Java • PythonPlug-in to Protege • Perl service scaffolds • Java service scaffolds
  84. 84. Part V - BCLIENTS
  85. 85. SHARE • you’ve already seen how SHARE works...
  86. 86. Taverna • Contextual service discovery • Automatic RDF serialization and deserialization beetween SADI and non-SADI services • Note that Taverna is not as rich a client as SHARE. The reason is that SHARE will aggregate and re-reason after every service invocation. There is no (automatic) data aggregation in Taverna.
  87. 87. Using SADI services – building a workflowThe next step in the workflow is to find a SADI service that takes thegenes from getKEGGGenesByPathway and returns the proteinsthat those genes code for.
  88. 88. Using SADI services – building a workflowRight-click on the service output port and click Find services thatconsume KEGG_Record…
  89. 89. Using SADI services – building a workflowSelect getUniprotByKeggGene from the list of SADI services andclick Connect.
  90. 90. Using SADI services – building a workflowThe getUniprotByKeggGene service is added to the workflow andautomatically connected to the output fromgetKEGGGenesByPathway.
  91. 91. Using SADI services – building a workflowAdd a new workflow output called protein and connect the outputfrom the getUniprotByKeggGene service to it.
  92. 92. Using SADI services – building a workflowThe next step in the workflow is to find a SADI service that takes theproteins and returns sequences of those proteins. Right-click on theencodes output port and click Find services that consumeUniProt_Record…
  93. 93. Using SADI services – building a workflowThe UniProt info service attaches the property hasSequence soselect this service and click Connect.
  94. 94. Using SADI services – building a workflowThe UniProt info service is added to the workflow and automaticallyconnected to the output from getUniprotByKeggGene .
  95. 95. Using SADI services – building a workflowAdd a new workflow output called sequence and connect the outputfrom the hasSequence output from the UniProt info service to it.
  96. 96. Using SADI services – building a workflowThe KEGG pathway were interested in is "hsa00232”, so we’ll add it asa constant value. Right-click on the KEGG_PATHWAY_Recordinput port and click Constant value.
  97. 97. Using SADI services – building a workflowEnter the value hsa00232 and click OK.
  98. 98. Using SADI services – building a workflowThe workflow is now complete and ready to run.
  99. 99. IO Informatics Knowledge Explorer plug-in • “Bootstrapping” of semantics using known URI schema (identifiers.org, LSRN, Bio2RDF, etc.) • Contextual service discovery • Automatic packaging of appropriate data from your data-store and automated service invocation using that data. •This uses some not-widely-known services and metadata that is in the SHARE registry!!
  100. 100. The SADI plug-in to the IO Informatics’ Knowledge Explorer...a quick explanation of howwe “boot-strap” semantics...
  101. 101. The Knowledge Explorer Personal Edition,and the SADI plug-in, are freely available.
  102. 102. Sentient Knowledge Explorer is a retrieval, integration,visualization, query, and exploration environment for semantically rich data
  103. 103. Most imported data-sets will already have properties (e.g. “encodes”) …and the data will already be typed (e.g. “Gene” or “Protein”)…so finding SADI Services to consume that data is ~trivial
  104. 104. Now what...??No properties...No rdf:type...How do I find a service using that node?What *is* that node anyway??
  105. 105. In the case of LSRN URIs, they resolve to:<lsrn:DragonDB_Locus_Record rdf:about="http://lsrn.org/DragonDB_Locus:CHO" <dc:identifier>CHO</dc:identifier> <sio:SIO_000671> <!-- has identifier --> <lsrn:DragonDB_Locus_Identifier> <sio:SIO_000300>CHO</sio:SIO_000300> <!-- has value --> </lsrn:DragonDB_Locus_Identifier> </sio:SIO_000671></lsrn:DragonDB_Locus_Record></rdf:RDF>
  106. 106. In the case of LSRN URIs, they resolve to:<lsrn:DragonDB_Locus_Record rdf:about="http://lsrn.org/DragonDB_Locus:CHO <dc:identifier>CHO</dc:identifier> <sio:SIO_000671> <!-- has identifier --> <lsrn:DragonDB_Locus_Identifier> <sio:SIO_000300>CHO</sio:SIO_000300> <!-- has value --> </lsrn:DragonDB_Locus_Identifier> </sio:SIO_000671></lsrn:DragonDB_Locus_Record></rdf:RDF> The Semantic Science Integrated Ontology (Dumontier) has a model for how to describe database records, including explicitly making the record identifier an attribute of that record; in our LSRN metadata, we also explicitly rdf:type both records and identifiers.
  107. 107. Now we have enough information to start exploring global data...
  108. 108. Menu option provided by the plugin
  109. 109. Discovered the (only)service that consumesthese kinds of records
  110. 110. Output is added to the graph (withsome extra logic to make visualizationof complex data structures a bit easier)
  111. 111. Lather, rinse,repeat...
  112. 112. ...and of course,these links are“live”
  113. 113. What about URIs other than LSRN?
  114. 114. HTTP POST the URI to the SHARE Resolver ServiceIt will (try to) return you SIO-compliant RDF metadata about that URI (this is a typical SADI service)The resolver currently recognizes a few different sharted-URI schemes (e.g. Bio2RDF, Identifiers.org)and can be updated with new patterns
  115. 115. Next problem: Knowledge Explorerand therefore the plug-in are written in C#All of our interfaces are described in OWL C# reasoners areextremely limited at this time
  116. 116. This problem manifests itself in two ways:1. An individual on the KE canvas has all the properties required by a Service in the registry, but is not rdf:typed as that Service’s input type  how do you discover that Service so that you can add it to the menu?2. For a selected Service from the menu, how does the plug-in know which data-elements it needs to extract from KE to send to that service in order to fulfil it’s input property-restrictions?
  117. 117. If I select a canvas node, and ask SADI tofind services, it will...
  118. 118. The get_sequence_for_region servicerequired ALL of this (hidden) information
  119. 119. Nevertheless:(a) The service can be discovered based on JUST this node selection(b) The service can be invoked based on JUST this node selection
  120. 120. Voila! How did the plug-in discover the service,and determine which data was required toaccess that service based on an OWL Class definition, without a reasoner?
  121. 121. SELECT ?x, ?y FROM knowledge_explorer_database WHERE { ?x foaf:name ?y } Convert Input OWL Class def’n into an ~equivalent SPARQL queryService DescriptionINPUT OWL Class Store togetherNamedIndividual: things with with index a “name” property INDEX from “foaf” ontology The service Registry provides aOUTPUT OWL Class “greeting”GreetedIndividual: things with property based a “greeting” property from “hello” ontology on a “name” property
  122. 122. Just to ensure that I don’t over-trivialize this point,the REAL SPARQL query that extracts the input for this service is...
  123. 123. CONSTRUCT { ?input a <http://sadiframework.org/ontologies/GMOD/BiopolymerRegion.owl#BiopolymerRegion> . ?input <http://sadiframework.org/ontologies/GMOD/BiopolymerRegion.owl#position> ?position . ?position a <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#RangedSequencePosition> . ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#coordinate> ?start . ?start a <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#StartPosition> . ?start <http://semanticscience.org/resource/SIO_000300> ?startValue . ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#coordinate> ?end . ?end a <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#EndPosition> . ?end <http://semanticscience.org/resource/SIO_000300> ?endValue . ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#in_relation_to> ?sequence . ?sequence <http://semanticscience.org/resource/SIO_000210> ?feature . ?feature <http://semanticscience.org/resource/SIO_000008> ?identifier . ?identifier <http://semanticscience.org/resource/SIO_000300> ?featureID . ?sequence <http://semanticscience.org/resource/SIO_000210> ?strand . ?strand <http://semanticscience.org/resource/SIO_000093> ?strandFeature . ?strandFeature a ?strandFeatureType . ?strandFeature <http://semanticscience.org/resource/SIO_000008> ?strandFeatureIdentifier . ?strandFeatureIdentifier <http://semanticscience.org/resource/SIO_000300> ?strandFeatureID . ?strand a ?strandType .} WHERE { ?input <http://sadiframework.org/ontologies/GMOD/BiopolymerRegion.owl#position> ?position . ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#coordinate> ?start . ?start a <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#StartPosition> . ?start <http://semanticscience.org/resource/SIO_000300> ?startValue . ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#coordinate> ?end . ?end a <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#EndPosition> . ?end <http://semanticscience.org/resource/SIO_000300> ?endValue . ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#in_relation_to> ?sequence . { ?sequence <http://semanticscience.org/resource/SIO_000210> ?feature . ?feature <http://semanticscience.org/resource/SIO_000008> ?identifier . ?identifier <http://semanticscience.org/resource/SIO_000300> ?featureID . } UNION { ?sequence <http://semanticscience.org/resource/SIO_000210> ?strand . ?strand <http://semanticscience.org/resource/SIO_000093> ?strandFeature . { ?strandFeature a <http://sadiframework.org/ontologies/GMOD/Feature.owl#Feature> . } UNION { ?strandFeature <http://semanticscience.org/resource/SIO_000008> ?strandFeatureIdentifier. ?strandFeatureIdentifier <http://semanticscience.org/resource/SIO_000300>?strandFeatureID . }. { ?strand a <http://sadiframework.org/ontologies/GMOD/Strand.owl#PlusStrand> . ?strand a ?strandType . } UNION { ?strand a <http://sadiframework.org/ontologies/GMOD/Strand.owl#MinusStrand> . ?strand a ?strandType . }. }.}
  124. 124. Summary While the Knowledge Explorer plug-in has similar functionality to other tools we have built for SADI, ittakes advantage of some features of the SADI Registry, and SADI in general, that are not widely-known. We hope that the availability of these features encourages development of SADI tooling in other languages that have limited access to reasoning.
  125. 125. Luke McCarthyLead Developer, SADI projectBenjamin VanderValkDeveloper, SADI project

×