SADI CSHALS 2013
Upcoming SlideShare
Loading in...5
×
 

SADI CSHALS 2013

on

  • 620 views

the introduction to the SADI tutorial at CSHALS 2013, Boston, Feb 27, 2013

the introduction to the SADI tutorial at CSHALS 2013, Boston, Feb 27, 2013

Statistics

Views

Total Views
620
Slideshare-icon Views on SlideShare
620
Embed Views
0

Actions

Likes
0
Downloads
10
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    SADI CSHALS 2013 SADI CSHALS 2013 Presentation Transcript

    • Semantic Automated Discovery and Integration SADI Services Tutorial Mark Wilkinson Isaac Peral Senior Researcher in Biological Informatics Centro de Biotecnología y Genómica de Plantas, UPM, Madrid, Spain Adjunct Professor of Medical Genetics, University of British Columbia Vancouver, BC, Canada.
    • Part IMOTIVATION
    • A lot of important information cannot be represented on the Semantic Web For example, all of the data that results from analytical algorithms and statistical analyses (I’m purposely excluding databases from the list of examples for reasons I will discuss in a moment)
    • Varying estimatesput the size of theDeep Web between500 and 800 timeslarger than thesurface Web
    • On the WWW“automation” ofaccess to Deep Webdata happens through“Web Services”
    • Traditional definitions of The Deep Web include databases that have Web FORM interfaces. HOWEVER The Life Science Semantic Web communityis encouraging the establishment of SPARQL endpoints as the way to serve that same data to the world (i.e. NOT through Web Services)
    • I am quite puzzled by this...
    • Historically, most* bio/informatics databases do not allow direct public SQL access *yes, I know there are some exceptions!
    • “We need to commit specific hardware forthat [mySQL] service. We don’t use thesame servers for mySQL as for theWebsite...”“...we resolve the situation by asking theuser to stop hammering the server. Thismight involve temporary ban on the IP...” - ENSEMBL Helpdesk
    • So... There appears to be good reasonswhy most data providers do not expose their databases for public query!
    • Are SPARQL endpoints somehow “safer” or “better”?
    • One of the early-adopters of RDF/SPARQLin the bioinformatics domain was UniProt
    • How are things going for them?
    • Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yesTo: Mark <markw@illuminae.com>Date: Tue, 19 Feb 2013 13:11:22 +0100 A message posted to the Bio2RDFSubject: SPARQL or not?MIME-Version: 1.0Content-Transfer-Encoding: 7bit mailing list last week from JervenFrom: "Mark Wilkinson" <markw@illuminae.com>Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark>User-Agent: Opera Mail/12.14 (Linux) Bolleman, one of the team-membersX-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614]X-AVG-ID: ID798D8A94-2992BC71 behind UniProt’s push for RDF...Hi Bio2RDF maintainers,I keep on noticing this rather expensive query.CONSTRUCT { <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> . <http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s . <http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s . ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label . ?s <http://purl.org/dc/elements/1.1/title> ?title . ?s <http://purl.org/dc/terms/title> ?dctermstitle . ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type . ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel . ?s ?p ?o .}WHERE { ?s ?p ?o FILTER contains(str(?o), ""Paget"") OPTIONAL { ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label } OPTIONAL { ?s <http://purl.org/dc/elements/1.1/title> ?title } OPTIONAL { ?s <http://purl.org/dc/terms/title> ?dctermstitle } OPTIONAL { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type } OPTIONAL { ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel } }OFFSET 0LIMIT 500It comes from the example queries on the bio2rdf landing page.Its extremely resource consuming and totally useless as it will never ever run in time.Can you please change this query to something useful and workable. And at least cache the results if you ever get them.Regards,Jerven
    • Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yesTo: Mark <markw@illuminae.com>Date: Tue, 19 Feb 2013 13:11:22 +0100Subject: SPARQL or not?MIME-Version: 1.0Content-Transfer-Encoding: 7bitFrom: "Mark Wilkinson" <markw@illuminae.com>Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark>User-Agent: Opera Mail/12.14 (Linux)X-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614]X-AVG-ID: ID798D8A94-2992BC71 I keep noticing thisHi Bio2RDF maintainers,I keep on noticing this rather expensive query. rather expensive queryCONSTRUCT { <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> . <http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s . <http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s . ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label . ?s <http://purl.org/dc/elements/1.1/title> ?title . ?s <http://purl.org/dc/terms/title> ?dctermstitle . ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type . ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel . ?s ?p ?o .}WHERE { ?s ?p ?o FILTER contains(str(?o), ""Paget"") OPTIONAL { ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label } OPTIONAL { ?s <http://purl.org/dc/elements/1.1/title> ?title } OPTIONAL { ?s <http://purl.org/dc/terms/title> ?dctermstitle } OPTIONAL { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type } OPTIONAL { ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel } }OFFSET 0LIMIT 500It comes from the example queries on the bio2rdf landing page.Its extremely resource consuming and totally useless as it will never ever run in time.Can you please change this query to something useful and workable. And at least cache the results if you ever get them.Regards,Jerven
    • Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yesTo: Mark <markw@illuminae.com>Date: Tue, 19 Feb 2013 13:11:22 +0100Subject: SPARQL or not?MIME-Version: 1.0Content-Transfer-Encoding: 7bitFrom: "Mark Wilkinson" <markw@illuminae.com>Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark>User-Agent: Opera Mail/12.14 (Linux)X-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614]X-AVG-ID: ID798D8A94-2992BC71Hi Bio2RDF maintainers,I keep on noticing this rather expensive query.CONSTRUCT { <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> . <http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s . <http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s . ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label . ?s <http://purl.org/dc/elements/1.1/title> ?title . ?s <http://purl.org/dc/terms/title> ?dctermstitle . ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type . ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel . ?s ?p ?o .}WHERE { ?s ?p ?o FILTER contains(str(?o), ""Paget"") OPTIONAL It comes from THE EXAMPLE QUERIES { ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label } OPTIONAL on the Bio2RDF landing page { ?s <http://purl.org/dc/elements/1.1/title> ?title } OPTIONAL { ?s <http://purl.org/dc/terms/title> ?dctermstitle } OPTIONAL { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type } (my emphasis added) OPTIONAL { ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel } }OFFSET 0LIMIT 500It comes from the example queries on the bio2rdf landing page.Its extremely resource consuming and totally useless as it will never ever run in time.Can you please change this query to something useful and workable. And at least cache the results if you ever get them.Regards,Jerven
    • Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yesTo: Mark <markw@illuminae.com>Date: Tue, 19 Feb 2013 13:11:22 +0100Subject: SPARQL or not?MIME-Version: 1.0Content-Transfer-Encoding: 7bitFrom: "Mark Wilkinson" <markw@illuminae.com>Message-ID: <op.wsq5g8jenbznux@bioinformatica-mark>User-Agent: Opera Mail/12.14 (Linux)X-Antivirus: AVG for E-mail 2012.0.2238 [2639/5614]X-AVG-ID: ID798D8A94-2992BC71Hi Bio2RDF maintainers,I keep on noticing this rather expensive query.CONSTRUCT { <http://bio2rdf.org/search/Paget> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://bio2rdf.org/bio2rdf_resource:SearchResults> . <http://bio2rdf.org/search/Paget> <http://bio2rdf.org/bio2rdf_resource:hasSearchResult> ?s . <http://bio2rdf.org/search/Paget> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?s . ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label . ?s <http://purl.org/dc/elements/1.1/title> ?title . ?s <http://purl.org/dc/terms/title> ?dctermstitle . ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type . ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel . ?s ?p ?o .}WHERE { ?s ?p ?o FILTER contains(str(?o), ""Paget"") OPTIONAL { ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label } OPTIONAL { ?s <http://purl.org/dc/elements/1.1/title> ?title } OPTIONAL It’s extremely resource- { ?s <http://purl.org/dc/terms/title> ?dctermstitle } OPTIONAL { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type } consuming and totally useless as } OPTIONAL { ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?skoslabel } it will never run in timeOFFSET 0LIMIT 500It comes from the example queries on the bio2rdf landing page.Its extremely resource consuming and totally useless as it will never ever run in time.Can you please change this query to something useful and workable. And at least cache the results if you ever get them.Regards,Jerven
    • So even people who are world-leaders in RDF and SPARQL write “expensive” and “useless” queries that (already!) are making life difficult for SPARQL endpoint providers I believe that situation will only get worse as more people begin to use the Semantic Web and as SPARQL itself becomes richer and more SQL-like
    • In My Opinion History tells us, and this story IMO supports,that SPARQL endpoints might not be widely adopted by source bioinformatics data providersHistorically, the majority of bioinformatics data hosts have opted for API/Service-based access to their resources
    • In My OpinionMoreover, I am still obsessed with interoperability! Having a unified way to discover, and access, bioinformatics resources whether they be databases or algorithms just seems like a Good Thing™
    • In My OpinionSo we need to find a way to make Web Services play nicely with the Semantic Web
    • Design Pattern forWeb Services on the Semantic Web
    • Part IISADI “PHILOSOPHY” AND DESIGN
    • The Semantic Web causally related to
    • The important bitThe link is explicitly labeled causally related to ???
    • causally related with http://semanticscience.org/resource/SIO_000243SIO_000243:<owl:ObjectProperty rdf:about="&resource;SIO_000243"> <rdfs:label xml: lang="en"> is causally related with</rdfs:label> <rdf:type rdf:resource="&owl;SymmetricProperty"/> <rdf:type rdf:resource="&owl;TransitiveProperty"/> <dc:description xml:lang="en"> A transitive, symmetric, temporal relationin which one entity is causally related with another non-identical entity. </dc:description> <rdfs:subPropertyOf rdf:resource="&resource;SIO_000322"/> </owl:ObjectProperty>
    • causally related with http://semanticscience.org/resource/SIO_000243SIO_000243:<owl:ObjectProperty rdf:about="&resource;SIO_000243"> <rdfs:label xml: lang="en"> is causally related with</rdfs:label> <rdf:type rdf:resource="&owl;SymmetricProperty"/> <rdf:type rdf:resource="&owl;TransitiveProperty"/> <dc:description xml:lang="en"> A transitive, symmetric, temporal relationin which one entity is causally related with another non-identical entity. </dc:description> <rdfs:subPropertyOf rdf:resource="&resource;SIO_000322"/> </owl:ObjectProperty>
    • There are many suggestions for how to bring the Deep Webinto the Semantic Web using Semantic Web Services (SWS) OWL-S SAWSDL WSDL-S Others...
    • There are many suggestions for how to bring the Deep Webinto the Semantic Web using Semantic Web Services (SWS) Describe input data Describe output data Describe how the system manipulates the data Describe how the world changes as a result
    • There are many suggestions for how to bring the Deep Webinto the Semantic Web using Semantic Web Services (SWS) Describe input data Usually through “semantic annotation” Describe output data of XML Schema Describe how the system manipulates the data Describe how the world changes as a result
    • There are many suggestions for how to bring the Deep Webinto the Semantic Web using Semantic Web Services (SWS) Describe input data In the least-semantic case, the input and output data is “vanilla” Describe output data XML Describe how the system manipulates the data Describe how the world changes as a result
    • There are many suggestions for how to bring the Deep Webinto the Semantic Web using Semantic Web Services (SWS) Describe input data In the “most semantic” case (WSDL) RDF is converted into XML, Describe output data then back to RDF again Describe how the system manipulates the data Describe how the world changes as a result
    • There are many suggestions for how to bring the Deep Webinto the Semantic Web using Semantic Web Services (SWS) Describe input data The rigidity of XML Schema is the antithesis of the Describe output data Semantic Web! Describe how the system manipulates the data Describe how the world changes as a result
    • There are many suggestions for how to bring the Deep Webinto the Semantic Web using Semantic Web Services (SWS) Describe input data So... Perhaps we shouldn’t be using XML Describe output data Schema at all...?? Describe how the system manipulates the data Describe how the world changes as a result
    • There are many suggestions for how to bring the Deep Webinto the Semantic Web using Semantic Web Services (SWS) Describe input data Describe output data Describe how the system manipulates the data HARD! Describe how the world changes as a result
    • There are many suggestions for how to bring the Deep Webinto the Semantic Web using Semantic Web Services (SWS) Describe input data Describe output data Describe how the system manipulates the data Un-necessary? Describe how the world changes as a result
    • Lord, Phillip, et al. The Semantic Web–ISWC 2004 (2004): 350-364.
    • Scientific Web Services are DIFFERENT!Lord, Phillip, et al. The Semantic Web–ISWC 2004 (2004): 350-364.
    • “The service interfaces within bioinformatics are relatively simple. An extensible or constrained interoperability framework is likely to suffice for current demands: a fully generic framework is currently not necessary.”Lord, Phillip, et al. The Semantic Web–ISWC 2004 (2004): 350-364.
    • Scientific Web Services are DIFFERENT They’re simpler!Rather than waiting for a solution to the more general problem (which may be years away... or more!) can we solve the Semantic Web Service problem within the scientific domain while still being fully standards-compliant?
    • Other “philosophical” considerations
    • v.v. being Semantic Webby, what is missing from this list? Describe input data Describe output dataDescribe how the system manipulates the data Describe how the world changes as a result
    • causally related withhttp://semanticscience.org/resource/SIO_000243
    • causally related with http://semanticscience.org/resource/SIO_000243The Semantic Web works because of relationships!
    • causally related with http://semanticscience.org/resource/SIO_000243 The Semantic Web works because of relationships! In 2008 I proposed that, in the Semantic Web world,algorithms should be viewed as “exposing” relationships between the input and output data
    • Web ServiceAACTCTTCGTAGTG... BLAST
    • SADI AACTCTTCGTAGTG... has_seq_string sequenceAACTCTTCGTAGTG... has_seq_string has homology sequence to BLAST Terminal Flower type species SADI requires you to explicitly declare as part of your analytical output, gene A. thal. the biological relationship that your algorithm “exposed”.
    • Another “philosophical” decision was to abandon XML Schema In a world that is moving towards RDF representations of all datait makes no sense to convert semantically rich RDF into semantic-free Schema-based XML then back into RDF again
    • The final philosophical decision was to abandon SOAPThe bioinformatics community seems to be very receptive to pure-HTTP interfaces (e.g. the popularity of REST-like APIs) So SADI uses simple HTTP POST of just the RDF input data (no message scaffold whatsoever)
    • Part IIISADI SERVICE DISCOVERYAND INVOCATION
    • In slightly more detail...
    • ID Name Height Weight Age 24601 Jean Valjean 1.8m 84kg 457474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39... ... ... ... ...
    • ID Name Height Weight Age 24601 Jean Valjean 1.8m 84kg 457474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39... ... ... ... ...
    • OWL-DL Classes ID Name Height Weight Age 24601 Jean Valjean 1.8m 84kg 45 7474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39 ... ... ... ... ...
    • Property restrictionsin OWL Class definition ID Name Height Weight Age 24601 Jean Valjean 1.8m 84kg 45 7474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39 ... ... ... ... ...
    • ID Name Height Weight Age 24601 Jean Valjean 1.8m 84kg 457474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39... ... ... ... ...
    • A reasoner determines that Patient #24601 is an OWL Individual of the Input service ClassID Name Height Weight Age 24601 Jean Valjean 1.8m 84kg 457474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39... ... ... ... ...
    • NOTE THE URI OF THE INPUT INDIVIDUAL Patient:24601ID Name Height Weight Age 24601 Jean Valjean 1.8m 84kg 457474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39... ... ... ... ...
    • ID Name Height Weight Age BMI 24601 Jean Valjean 1.8m 84kg 45 25.97474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39... ... ... ... ...
    • NOTE THE URI OF THE OUTPUT INDIVIDUAL Patient:24601 ID Name Height Weight Age BMI 24601 Jean Valjean 1.8m 84kg 45 25.9 7474505B Jake Blues 1.73m 101kg 31 6 — 1.88m 75kg 39 ... ... ... ... ...
    • The URI of the input is linked by a meaningful predicate to the output(either literal output or another URI)
    • Therefore, by connecting SADI servicestogether in a workflow you end-up with an unbroken chain of Linked Data
    • Part IVSADI TO THE EXTREME:“WEB SCIENCE 2.0”
    • A proof-of-concept query engine & registry Objective: answer biologists’ questions
    • The SHARE registry indexes all of the input/output/relationshiptriples that can be generated by all known services This is how SHARE discovers services
    • We wanted to duplicatea real, peer-reviewed, bioinformatics analysis simply by building a model in the Web describing what the answer (if one existed) would look like
    • ...the machine had to make every other decision on it’s own
    • This is the study we chose:
    • Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspeciesdata mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
    • Original Study SimplifiedUsing what is known about interactions in fly & yeast predict new interactions with your protein of interest
    • “Pseudo-code” Abstracted WorkflowGiven a protein P in Species X Find proteins similar to P in Species Y Retrieve interactors in Species Y Sequence-compare Y-interactors with Species X genome (1)  Keep only those with homologue in X Find proteins similar to P in Species Z Retrieve interactors in Species Z Sequence-compare Z-interactors with (1)  Putative interactors in Species X
    • Modeling the science... OWL
    • Modeling the science... ProbableInteractor is homologous to ( Potential Interactor from ModelOrganism1…) and Potential Interactor from ModelOrganism2…)Probable Interactor is defined in OWL as a subClass - something that appears as a potential interactor in both comparator model organisms.
    • Running the Web Science Experiment In a local data-file provide the protein we are interested in and the two species we wish to use in our comparison taxon:9606 a i:OrganismOfInterest . # human uniprot:Q9UK53 a i:ProteinOfInterest . # ING1 taxon:4932 a i:ModelOrganism1 . # yeast taxon:7227 a i:ModelOrganism2 . # fly
    • The tricky bit is... In the abstract, the search for homology is “generic” – ANY Protein, ANY model systemBut when the machine does the experiment, it will need to use (at least) twoorganism-specific resourcesbecause the answer requires information from two taxon:4932 a i:ModelOrganism1 . # yeast declared species taxon:7227 a i:ModelOrganism2 . # fly
    • This is the question we ask: (the query language here is SPARQL)PREFIX i: <http://sadiframework.org/ontologies/InteractingProteins.owl#>SELECT ?proteinFROM <file:/local/workflow.input.n3>WHERE { ?protein a i:ProbableInteractor .} The URL of our OWL model (ontology) defining Probable Interactors
    • Each relationship (property-restriction)in the OWL Class is then matchedwith a SADI ServiceThe matched SADI Service cangenerate data that fulfils thatproperty restriction(i.e. produces triples with that S/P/O pattern)
    • SHARE chains these SADI servicesinto an analytical workflow......the outputs from that workflow areInstances (OWL Individuals) ofProbable Interactors
    • SHARE derived (and executed) the following workflow automatically These are different SADI Web Services... ...selected at run-time based on the same model
    • Keys to Success:1: Use standards2: Focus on predicates, not classes3: Use these predicates to define, rather than assert, classes4: Make sure all URIs resolve, and resolve to something useful5: Never leave the RDF world... (abandon vanilla XML, even for Web Services!)6: Use reasoners... Everywhere... Always!
    • Part VTHE TOOLS AVAILABLE
    • Part V - ASERVICE PROVISION
    • Libraries • Perl • Java • PythonPlug-in to Protege • Perl service scaffolds • Java service scaffolds
    • Part V - BCLIENTS
    • SHARE • you’ve already seen how SHARE works...
    • Taverna • Contextual service discovery • Automatic RDF serialization and deserialization beetween SADI and non-SADI services • Note that Taverna is not as rich a client as SHARE. The reason is that SHARE will aggregate and re-reason after every service invocation. There is no (automatic) data aggregation in Taverna.
    • Using SADI services – building a workflowThe next step in the workflow is to find a SADI service that takes thegenes from getKEGGGenesByPathway and returns the proteinsthat those genes code for.
    • Using SADI services – building a workflowRight-click on the service output port and click Find services thatconsume KEGG_Record…
    • Using SADI services – building a workflowSelect getUniprotByKeggGene from the list of SADI services andclick Connect.
    • Using SADI services – building a workflowThe getUniprotByKeggGene service is added to the workflow andautomatically connected to the output fromgetKEGGGenesByPathway.
    • Using SADI services – building a workflowAdd a new workflow output called protein and connect the outputfrom the getUniprotByKeggGene service to it.
    • Using SADI services – building a workflowThe next step in the workflow is to find a SADI service that takes theproteins and returns sequences of those proteins. Right-click on theencodes output port and click Find services that consumeUniProt_Record…
    • Using SADI services – building a workflowThe UniProt info service attaches the property hasSequence soselect this service and click Connect.
    • Using SADI services – building a workflowThe UniProt info service is added to the workflow and automaticallyconnected to the output from getUniprotByKeggGene .
    • Using SADI services – building a workflowAdd a new workflow output called sequence and connect the outputfrom the hasSequence output from the UniProt info service to it.
    • Using SADI services – building a workflowThe KEGG pathway were interested in is "hsa00232”, so we’ll add it asa constant value. Right-click on the KEGG_PATHWAY_Recordinput port and click Constant value.
    • Using SADI services – building a workflowEnter the value hsa00232 and click OK.
    • Using SADI services – building a workflowThe workflow is now complete and ready to run.
    • IO Informatics Knowledge Explorer plug-in • “Bootstrapping” of semantics using known URI schema (identifiers.org, LSRN, Bio2RDF, etc.) • Contextual service discovery • Automatic packaging of appropriate data from your data-store and automated service invocation using that data. •This uses some not-widely-known services and metadata that is in the SHARE registry!!
    • The SADI plug-in to the IO Informatics’ Knowledge Explorer...a quick explanation of howwe “boot-strap” semantics...
    • The Knowledge Explorer Personal Edition,and the SADI plug-in, are freely available.
    • Sentient Knowledge Explorer is a retrieval, integration,visualization, query, and exploration environment for semantically rich data
    • Most imported data-sets will already have properties (e.g. “encodes”) …and the data will already be typed (e.g. “Gene” or “Protein”)…so finding SADI Services to consume that data is ~trivial
    • Now what...??No properties...No rdf:type...How do I find a service using that node?What *is* that node anyway??
    • In the case of LSRN URIs, they resolve to:<lsrn:DragonDB_Locus_Record rdf:about="http://lsrn.org/DragonDB_Locus:CHO" <dc:identifier>CHO</dc:identifier> <sio:SIO_000671> <!-- has identifier --> <lsrn:DragonDB_Locus_Identifier> <sio:SIO_000300>CHO</sio:SIO_000300> <!-- has value --> </lsrn:DragonDB_Locus_Identifier> </sio:SIO_000671></lsrn:DragonDB_Locus_Record></rdf:RDF>
    • In the case of LSRN URIs, they resolve to:<lsrn:DragonDB_Locus_Record rdf:about="http://lsrn.org/DragonDB_Locus:CHO <dc:identifier>CHO</dc:identifier> <sio:SIO_000671> <!-- has identifier --> <lsrn:DragonDB_Locus_Identifier> <sio:SIO_000300>CHO</sio:SIO_000300> <!-- has value --> </lsrn:DragonDB_Locus_Identifier> </sio:SIO_000671></lsrn:DragonDB_Locus_Record></rdf:RDF> The Semantic Science Integrated Ontology (Dumontier) has a model for how to describe database records, including explicitly making the record identifier an attribute of that record; in our LSRN metadata, we also explicitly rdf:type both records and identifiers.
    • Now we have enough information to start exploring global data...
    • Menu option provided by the plugin
    • Discovered the (only)service that consumesthese kinds of records
    • Output is added to the graph (withsome extra logic to make visualizationof complex data structures a bit easier)
    • Lather, rinse,repeat...
    • ...and of course,these links are“live”
    • What about URIs other than LSRN?
    • HTTP POST the URI to the SHARE Resolver ServiceIt will (try to) return you SIO-compliant RDF metadata about that URI (this is a typical SADI service)The resolver currently recognizes a few different sharted-URI schemes (e.g. Bio2RDF, Identifiers.org)and can be updated with new patterns
    • Next problem: Knowledge Explorerand therefore the plug-in are written in C#All of our interfaces are described in OWL C# reasoners areextremely limited at this time
    • This problem manifests itself in two ways:1. An individual on the KE canvas has all the properties required by a Service in the registry, but is not rdf:typed as that Service’s input type  how do you discover that Service so that you can add it to the menu?2. For a selected Service from the menu, how does the plug-in know which data-elements it needs to extract from KE to send to that service in order to fulfil it’s input property-restrictions?
    • If I select a canvas node, and ask SADI tofind services, it will...
    • The get_sequence_for_region servicerequired ALL of this (hidden) information
    • Nevertheless:(a) The service can be discovered based on JUST this node selection(b) The service can be invoked based on JUST this node selection
    • Voila! How did the plug-in discover the service,and determine which data was required toaccess that service based on an OWL Class definition, without a reasoner?
    • SELECT ?x, ?y FROM knowledge_explorer_database WHERE { ?x foaf:name ?y } Convert Input OWL Class def’n into an ~equivalent SPARQL queryService DescriptionINPUT OWL Class Store togetherNamedIndividual: things with with index a “name” property INDEX from “foaf” ontology The service Registry provides aOUTPUT OWL Class “greeting”GreetedIndividual: things with property based a “greeting” property from “hello” ontology on a “name” property
    • Just to ensure that I don’t over-trivialize this point,the REAL SPARQL query that extracts the input for this service is...
    • CONSTRUCT { ?input a <http://sadiframework.org/ontologies/GMOD/BiopolymerRegion.owl#BiopolymerRegion> . ?input <http://sadiframework.org/ontologies/GMOD/BiopolymerRegion.owl#position> ?position . ?position a <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#RangedSequencePosition> . ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#coordinate> ?start . ?start a <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#StartPosition> . ?start <http://semanticscience.org/resource/SIO_000300> ?startValue . ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#coordinate> ?end . ?end a <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#EndPosition> . ?end <http://semanticscience.org/resource/SIO_000300> ?endValue . ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#in_relation_to> ?sequence . ?sequence <http://semanticscience.org/resource/SIO_000210> ?feature . ?feature <http://semanticscience.org/resource/SIO_000008> ?identifier . ?identifier <http://semanticscience.org/resource/SIO_000300> ?featureID . ?sequence <http://semanticscience.org/resource/SIO_000210> ?strand . ?strand <http://semanticscience.org/resource/SIO_000093> ?strandFeature . ?strandFeature a ?strandFeatureType . ?strandFeature <http://semanticscience.org/resource/SIO_000008> ?strandFeatureIdentifier . ?strandFeatureIdentifier <http://semanticscience.org/resource/SIO_000300> ?strandFeatureID . ?strand a ?strandType .} WHERE { ?input <http://sadiframework.org/ontologies/GMOD/BiopolymerRegion.owl#position> ?position . ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#coordinate> ?start . ?start a <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#StartPosition> . ?start <http://semanticscience.org/resource/SIO_000300> ?startValue . ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#coordinate> ?end . ?end a <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#EndPosition> . ?end <http://semanticscience.org/resource/SIO_000300> ?endValue . ?position <http://sadiframework.org/ontologies/GMOD/RangedSequencePosition.owl#in_relation_to> ?sequence . { ?sequence <http://semanticscience.org/resource/SIO_000210> ?feature . ?feature <http://semanticscience.org/resource/SIO_000008> ?identifier . ?identifier <http://semanticscience.org/resource/SIO_000300> ?featureID . } UNION { ?sequence <http://semanticscience.org/resource/SIO_000210> ?strand . ?strand <http://semanticscience.org/resource/SIO_000093> ?strandFeature . { ?strandFeature a <http://sadiframework.org/ontologies/GMOD/Feature.owl#Feature> . } UNION { ?strandFeature <http://semanticscience.org/resource/SIO_000008> ?strandFeatureIdentifier. ?strandFeatureIdentifier <http://semanticscience.org/resource/SIO_000300>?strandFeatureID . }. { ?strand a <http://sadiframework.org/ontologies/GMOD/Strand.owl#PlusStrand> . ?strand a ?strandType . } UNION { ?strand a <http://sadiframework.org/ontologies/GMOD/Strand.owl#MinusStrand> . ?strand a ?strandType . }. }.}
    • Summary While the Knowledge Explorer plug-in has similar functionality to other tools we have built for SADI, ittakes advantage of some features of the SADI Registry, and SADI in general, that are not widely-known. We hope that the availability of these features encourages development of SADI tooling in other languages that have limited access to reasoning.
    • Luke McCarthyLead Developer, SADI projectBenjamin VanderValkDeveloper, SADI project