Carole Goble



       “Shopping for data
        should be as easy
          as shopping for
                 shoes!!”
Evaluating Hypotheses
       using SPARQL-DL as an
     abstract workflow language
           to choreograph
     SADI Semantic Web Services



Mark Wilkinson, Isaac Peral Senior Researcher in Biological Informatics
Centro de Biotecnología y Genómica de Plantas, UPM.
Start at the end...
We wanted to duplicate
a real, peer-reviewed, bioinformatics analysis

   simply by providing a model describing
              what the answer
                (if one exists)
              would look like...
...the machine had to make
     every other decision
         on it’s own
This is the study we chose:



  Gordon, P.M.K., Soliman, M.A., Bose, P.,
   Trinh, Q., Sensen, C.W., Riabowol, K.

  Interspecies data mining to predict novel
     ING-protein interactions in human.

       BMC Genomics 9, 426 (2008).
Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies
data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
Original Study - simplified and abstracted:

Using what is known about interactions in other species,
   predict new interactions in your species of interest


 Given a protein P in Species X

      Find proteins similar to P in Species Y
      Retrieve interactors in Species Y
      Sequence-compare Y-interactors with Species X genome
            (1)  Keep only those with homologue in X

      Find proteins similar to P in Species Z
      Retrieve interactors in Species Z
      Sequence-compare Z-interactors with (1)

             Putative interactors in Species X
For our prototype study, we simplified this further to:




                                  X 2
                                  Then intersect
The tricky part is...

In the abstract, the two workflows
           are identical

but in reality they will be different
because they call for information
      from different species
Modeling the result – Step 1




                                           OWL




                Web Ontology Language (OWL) is the
                    language approved by the W3C
                for representing knowledge on the Web
Modeling the result – Step 2


ProbableInteractor:
  is homologous to (
    protein from ModelOrganism1…) # Potential Interactor in previous slide
       and
    protein from ModelOrganism2…) # Potential Interactor in previous slide



 Probable Interactor is defined in OWL as a subclass of Potential Interactor
    that requires homologous pairs of interacting proteins to exist in both
                        comparator model organisms.

                         (Effectively, an intersection)
We then publish our OWL model of a Probable Interactor on the Web

     In a local data-file we provide the protein we are interested in,
          and the two species we wish to use in our comparison




    taxon:9606           a     i:OrganismOfInterest . # human
    uniprot:Q9UK53       a     i:ProteinOfInterest . # ING1
    taxon:4932           a     i:ModelOrganism1 . # yeast
    taxon:7227           a     i:ModelOrganism2 . # fly




These four lines represent all of the data provided to the
             query I am about to show you...
This is the question we ask:


PREFIX i: <http://sadiframework.org/ontologies/InteractingProteins.owl#>

SELECT ?protein
FROM <file:/local/workflow.input.n3>
WHERE {

     ?protein        a       i:ProbableInteractor .


}



                               The reference to our OWL model of the answer
Our system then derives (and executes) the following workflow automatically




                                                     These are different
                                                     Web services!

                                                     ...selected at run-time
                                                     based on the same model
There are two very cool things about what you just saw...
There are two very cool things about what you just saw...




              The system was able to
            create a workflow based on
                  an OWL model
There are two very cool things about what you just saw...




               The workflow it created
              (i.e. the services chosen)
                 differed depending on
                         context
We got the answer

“simply” by designing a model of the answer!
Start at the beginning...
Semantic Automated Discovery and Integration

  A Semantic Web-focused Web Services specification
Microsoft
Research       http://sadiframework.org
Web Services

    vs.

Semantic Web
Web Services
XML + XML Schema

  Semantic Web
   RDF + OWL
Web Services
POST of SOAP

Semantic Web
 GET of RDF
Web Services
No (rigorous) semantics

    Semantic Web
Rich, flexible semantics
Web Services
        &
   Semantic Web

Fundamentally different
   Web technologies
A design-practice for
Web Service provision on the
      Semantic Web
100% standards-compliant
with no “invented” standards
Lightweight
(only 2 inter-related “rules”)
Rules come from
 observations:
SADI Observation #1:


Web Services in Bioinformatics create
 implicit biological relationships
   between their input and output
SADI Observation #1:




       SeqRet
SADI Design Practice #1

         Make the implicit explicit…


A Web Service should create “triples” linking
    the input data to the output data thus
explicitly describing the semantic relationship
                 between them
SADI Best Practice #1

     This is what bioinformatics Web Services
               implicitly do anyway!

           Easy to implement this as a
                  best-practice

...and makes SADI Services a source of Linked Data
SADI Observation #2:
         HTTP GET and POST



            GET guarantees
the response relates to the request URI
 in a very precise and predictable way


          POST does not…
SADI Observation #2:
            HTTP GET and POST



That’s why Web Services have a fundamentally
  different behaviour than the Semantic Web
SADI Observation #2:
              GET and POST



       We can fix that!

 (without breaking any existing rules or standards!)
SADI Best Practice #2

SUBJECT URI of the output graph (triples)

                 is the same as

 SUBJECT URI of the input graph (triples)



(the output is “about” the input... Now explicitly!)
SADI Best Practice #2

  GeneID                               GeneID


rdf:type                                 rdf:type


       BRCA1                        BRCA1


                           hasDNASequence


                  SeqRet            AGCTTA...
Consequence

Web Services now exhibit a very similar
     behavior to the Web itself

      POST “behaves like” GET
SADI Interfaces


Service Interfaces can be described by
           two OWL classes




  (this is 100% compatible with the SAWSDL standard)
SADI Interfaces



OWL Class #1: My Input Class
SADI Interfaces



OWL Class #2: My Output Class
SADI Service Functionality


      Consumes OWL Individuals of Class #1

        Returns OWL Individuals of Class #2



but the URI of those two              GeneID                                           GeneID



individuals is the same; they are   rdf:type
                                                                                             rdf:type

                                               BRCA1                             BRCA1


the same individual, just now a                                 hasDNASequence




member of a new class.                                 SeqRet                    AGCTTA...
In practice, of course, you don’t return the input data



Strip it and add the new data provided by the Service

But since the output is still “rooted” in the input node,

    Input and output are easily merged client-side

     (just concatenate the output with the input)
Service Description
     INPUT OWL Class
     NamedIndividual: things with
           a “name” property
           from “foaf” ontology


     OUTPUT OWL Class
     GreetedIndividual: things with
        a “greeting” property
        from “hello” ontology



           POST http://example.org/myservice



                        person:1

        foaf:name                   rdf:type

                                   hello:Named
     Guy Incognito                  Individual




                        person:1

     hello:greeting                 rdf:type

                                   hello:Greeted
Hello, Guy Incognito!                Individual
How do we discover services?


Input and output are about the same “thing”


Therefore, to describe what a service does
        simply compare (“diff”) the
     Input and Output OWL classes



                              This is not prescriptive! Just how we use it
Service Description
                       INPUT OWL Class
                       NamedIndividual: things with
                             a “name” property
                             from “foaf” ontology


                       OUTPUT OWL Class
                       GreetedIndividual: things with
                          a “greeting” property
                          from “hello” ontology


The service provides
                                                     POST http://example.org/myservice
a “greeting” to a
Named Individual
                                                                 person:1
based on its “name”
                                                  foaf:name                   rdf:type

                                                                             hello:Named
                                               Guy Incognito                  Individual




                                                                 person:1

                                              hello:greeting                 rdf:type

                                                                            hello:Greeted
                                         Hello, Guy Incognito!                Individual
Service Discovery


  Index all of the properties
 added by all of the services
   under all circumstances
Real-world Example




Input Data:           BRCA1 rdf:type    Gene ID

Output Data:          BRCA1    hasDNASequence     AGCTTAGCCA…

Registry Index: Service provides “hasDNASequence” property to Gene IDs
Service Discovery



Simply search for the property of interest
       based on the data in-hand
e.g. The question:


“what is the DNA sequence of BRCA1?”


Discover a SADI Web Service that generates the
  DNA Sequence property for gene identifiers
DEMO
     Knowledge Explorer
          Plug-in
For more information about the Knowledge Explorer surf to:
                 http://io-informatics.com
SADI has just filled-in “Encodes” property for the three genes
       from the output of discovered Web Service(s)
Discover services that provide the hasGOTerm
   property for Protein Sequence datatype
This kind of “Web Service Surfing” is very
            intuitive for the Biologist!

No need to describe the algorithm or the database
 just describe the properties that will be added
Semantic Health And Research Environment


SHARE answers arbitrary SPARQL queries
 by finding and executing SADI Services
Example #1

        What is the phenotype of every allele of the
          Antirrhinum majus DEFICIENS gene




SELECT ?allele    ?image     ?desc

WHERE {
      locus:DEF            genetics:hasVariant      ?allele .
        ?allele            info:visualizedByImage   ?image .
         ?image            info:hasDescription      ?desc
}
Example #1

        What is the phenotype of every allele of the
          Antirrhinum majus DEFICIENS gene




SELECT ?allele    ?image     ?desc

WHERE {
      locus:DEF            genetics:hasVariant             ?allele .
        ?allele            info:visualizedByImage          ?image .
         ?image            info:hasDescription             ?desc
}


                       Note that there is no “FROM” clause!
                       We don’t tell it where it should get the information,
                       The machine has to figure that out by itself...
Enter that query into
      SHARE
Click “Submit”...
...and in a few seconds you get your answer.

        Based on predicates in your query, SHARE utilized SADI
to automatically discover the resources required to answer your question.
Because it is the Semantic Web
The query results are live hyperlinks
to the respective Database or images
Importantly


We posed, and answered a
 complex database query

 WITHOUT A DATABASE
(in fact, the data didn’t even have to exist... as I’ll now show you)
Example #2

    Show me the latest Blood Urea Nitrogen and Creatinine levels
      of patients who appear to be rejecting their transplants


SELECT ?patient ?bun ?creat
FROM <http://sadiframework.org/ontologies/patients.rdf>
WHERE {
        ?patient rdf:type patient:LikelyRejecter .
        ?patient l:latestBUN ?bun .
        ?patient l:latestCreatinine ?creat .
}
Likely Rejecter:

A patient who has creatinine levels
   that are increasing over time
                            - - Wilkinson “MD”
Likely Rejecter:
Our database contains various
blood chemistry measurements
    at various time-points
Likely Rejecter:

 …but there is no “likely rejecter”
column or table in our database…
SHARE determines

               by itself

             the need to do a
     Linear Regression analysis over
Creatinine blood chemistry measurements
SHARE determines

         by itself

how and where that analysis
       can be done

        and does it
The SHARE system utilizes Semantics (via SADI) to discover and access
   analytical services on the Web that do linear regression analysis
VOILA!
How does SHARE work?
Ontologies
Ontology Spectrum

                         Thesauri                                                      Frames       Selected
                         “narrower                                                     (Properties) Logical
Catalog/                 term”                                      Formal                        Constraints
ID                       relation                                   is-a                            (disjointness,
                                                                                                      inverse, …)



         Terms/                               Informal                     Formal                        General
                                                                                    Value                 Logical
         glossary                             is-a                         instance
                                                                                    Restrs.            constraints




Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty;
– updated by McGuinness.
Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
Ontology Spectrum
                                              BETTER!!
           Thesauri                    Frames       Selected
           “narrower                   (Properties) Logical
Catalog/   term”              Formal               Constraints
ID         relation           is-a                   (disjointness,
                                                       inverse, …)



    Terms/         Informal     Formal                    General
                                         Value             Logical
    glossary       is-a         instance
                                         Restrs.        constraints


Basic SADI/SHARE functionality
Ontology Spectrum
                                                  Because it
                                                  fulfils XYZ
           Thesauri       WHY?              Frames       Selected
           “narrower                        (Properties) Logical
Catalog/   term”                  Formal                   Constraints
ID         relation               is-a                        (disjointness,
                                                                inverse, …)



    Terms/             Informal      Formal                        General
                                              Value                 Logical
    glossary           is-a          instance
                                              Restrs.            constraints
           Because
           I say so!

                              Why is this data a member of this OWL Class?
                              (and therefore valid input to the service)
Discovery systems - flexible




                                                                    Selected
                  Thesauri                                           Logical
Catalog/          “narrower term”        Formal
                                                  Frames          Constraints
                                                                 (disjointness,
ID                                                (Properties)
                  relation               is-a                      inverse, …)

       Terms/                 Informal       Formal       Value            General
                                                                            Logical
       glossary               is-a           instance     Restrs.        constraints




   Categorization Systems – like library shelves, inflexible
In the upper end of the Ontology Spectrum,

         if the data has the right properties

It can be discovered to be a valid input to a Service

regardless of how it was originally classified or which

      ontology was used for that classification
...and in the context of a SHARE query

 those individual properties may have been aggregated

              from many different places;


The data becomes a valid input as properties aggregate
In exactly the same way that the OWL property
restrictions of a SADI Input Class tell SHARE what
        properties a service requires as input


 The property restrictions of an OWL Class in the
  SPARQL query tell SHARE what properties it
needs to retrieve to create members of that class
Show me the latest Blood Urea Nitrogen and Creatinine levels
    of patients who appear to be rejecting their transplants



PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#>
PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#>
SELECT ?patient ?bun ?creat
FROM <http://sadiframework.org/ontologies/patients.rdf>
WHERE {
        ?patient rdf:type           patient:LikelyRejecter .
        ?patient l:latestBUN        ?bun .
        ?patient l:latestCreatinine ?creat .
}
The definition of a Likely Rejecter category is encoded in
       a machine-readable document written in the OWL Ontology language

Basically:

  “the regression line over creatinine measurements should have an increasing slope”
SHARE burrows down through the ontological definition
to learn about what the properties of “regression models” are
SHARE utilizes SADI to discover analytical services on the Web
 that do linear regression analysis then other services that can
    determine the “latest” measurements from a time-series
Let’s go through that again from a different perspective
OWL Classes that include property restrictions can be
       “executed” as if they were workflows
Ontology = Query = Workflow
WORKFLOW
QUERY:

   SELECT images of
mutations from genes in
  organism XXX that
share homology to this
 gene in organism YYY
Concept:

“Homologous Mutant
     Image”
As OWL Axioms
   HomologousMutantImage
     is owl:equivalentTo {

   Gene Q   hasImage image P

Gene Q   hasSequence Sequence Q

Gene R   hasSequence Sequence R

Sequence Q similarTo   Sequence R

 Gene R = “my gene of interest”   }
Those axioms
 combine to
  create an
 OWL Class:
 Homologous
 Mutant Image
QUERY:
    Retrieve
owl:Homologous
Mutant Image for
   gene XXX
SHARE:
       Decomposes the
owl:Homologous Mutant Image
Class, discovers SADI services
  relevant to that class, and
pipelines them together into a
           workflow
“The user experiments showed that
workflow re-use… is difficult for bioinformaticians.”



                                              - Gooderis, A. (2008) Ph.D. Thesis
Under slightly different circumstances
(e.g. studying the same phenomenon in a
            different organism)

              this workflow


  WILL NOT SOLVE THE PROBLEM



It must be edited on a case-by-case basis



  This editing turns out to be extremely
difficult, and can ~only be done manually
This great idea would be even better
if workflows were a little bit easier to re-purpose!
With SHARE, a workflow is generated
dynamically, based on all of the information
          presented in the query

e.g. to create a new workflow simply specify
     a different organism ID in the query!
With SHARE, a workflow is generated
dynamically, based on all of the information
          presented in the query

  Moreover, the workflow plan CHANGES
  dynamically based on service outputs!
With SHARE, the ontology is the workflow
  (not the same as “an ontology that describes workflows”)


  The ontology acts as an abstraction of a
   workflow, which is concretized at run-
time based on circumstances of the query
Works (best) with ontologies in the “Frames+” part of
 the spectrum, if there are SADI services available
                                                                                       Selected
                           Thesauri                                                     Logical
                           “narrower                                                 Constraints
                                                  Formal         Frames
     Catalog/              term”                                                       (disjointness,
                                                  is-a           (Properties)           inverse, …)
     ID                    relation

                Terms/                 Informal       Formal               Value                   General
                                                                                                   Logical
                glossary               is-a           instance             Restrs.
                                                                                                constraints
As far as we are aware, SHARE is the only system that
           exhibits this particular behaviour

        ...and IMO this is a pretty big deal...
Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies
data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
That experiment can now be represented ONCE
              as an OWL Class


     It becomes concretized automatically

       for each individual researcher

as a distinct workflow given any starting protein
 and any combination of comparator species
This is far beyond simply changing the parameters entered into a workflow...
Moreover, the idea of automated workflow “individuality” is quite interesting to us...
This is an early prototype of a

Patient-driven Personalized Medicine

           Web interface
Matching based on official
  name, compound name,
brand name, trade name, or
    “common name” 
Still needs some work...

        ??!?!?
Why the alert?




                 Link out to PubMed
The SADI+SHARE workflow and reasoning was
personalized to YOUR medical data
In future iterations, we will enable the workflow
to be further customized through “personalized”
OWL Classes (e.g. Provided by your Clinician!!)
These OWL Classes might include information about the
current trajectory of your treatment for a chronic disease,
for example, such that what you read on the Web is
placed in the context of your expert Clinical care...
Frankly, I think it’s quite cool that “people”
 are creating and running personalized
   workflows at the touch of a button...
...as many of you know, that has been my
    dream since I started studying this
         problem a decade ago!
Final thoughts
An experiment... based on a hypothesis
An experiment... based on a hypothesis

        now modeled in OWL




               ...
               ...
Does this OWL Class represent a Hypothesis?

               I think it does!




                  ...
                  ...
I believe that we will soon show

                 using SADI + SHARE

            that we can model a non-trivial
            hypothetical biological scenario

   then evaluate if that hypothesis is supported or not
based on whether the automatically-synthesized workflow
                 returns any individuals
               that conform to the model.
Ontology = Hypothesis = Query = Workflow [= Materials and Methods ]




             Most of your publication is done!

      All you need to do now is interpret the results!




                                    These can be automatically derived through
                                    provenance information during workflow execution
Please join us!

SADI and SHARE are Open-Source projects

       http://sadiframework.org
My New Home!
University of British Columbia


Luke McCarthy – Lead Dev.                  Edward Kawas
Everything...                              SADI Service auto-generator



Benjamin VanderValk                        Ian Wood
SHARE & SADI & Experimental modeling &     Experimental modeling project
myHeath Button




Soroush Samadian
Cardiovascular data modeling and queries
C-BRASS Collaborators at other sites

U of New Brunswick      Carleton University

Dr. Chris Baker         Dr. Michel Dumontier
Alexandre Riazanov      Marc-Alexandre Nolin
                        Leonid Chepelev
                        Steve Etlinger
                        Nichaella Kieth
                        Jose Cruz
Microsoft Research

Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to choreograph SADI Semantic Web Services

  • 1.
    Carole Goble “Shopping for data should be as easy as shopping for shoes!!”
  • 2.
    Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to choreograph SADI Semantic Web Services Mark Wilkinson, Isaac Peral Senior Researcher in Biological Informatics Centro de Biotecnología y Genómica de Plantas, UPM.
  • 3.
  • 4.
    We wanted toduplicate a real, peer-reviewed, bioinformatics analysis simply by providing a model describing what the answer (if one exists) would look like...
  • 5.
    ...the machine hadto make every other decision on it’s own
  • 6.
    This is thestudy we chose: Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K. Interspecies data mining to predict novel ING-protein interactions in human. BMC Genomics 9, 426 (2008).
  • 7.
    Gordon, P.M.K., Soliman,M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
  • 8.
    Original Study -simplified and abstracted: Using what is known about interactions in other species, predict new interactions in your species of interest Given a protein P in Species X Find proteins similar to P in Species Y Retrieve interactors in Species Y Sequence-compare Y-interactors with Species X genome (1)  Keep only those with homologue in X Find proteins similar to P in Species Z Retrieve interactors in Species Z Sequence-compare Z-interactors with (1)  Putative interactors in Species X
  • 9.
    For our prototypestudy, we simplified this further to: X 2 Then intersect
  • 10.
    The tricky partis... In the abstract, the two workflows are identical but in reality they will be different because they call for information from different species
  • 11.
    Modeling the result– Step 1 OWL Web Ontology Language (OWL) is the language approved by the W3C for representing knowledge on the Web
  • 12.
    Modeling the result– Step 2 ProbableInteractor: is homologous to ( protein from ModelOrganism1…) # Potential Interactor in previous slide and protein from ModelOrganism2…) # Potential Interactor in previous slide Probable Interactor is defined in OWL as a subclass of Potential Interactor that requires homologous pairs of interacting proteins to exist in both comparator model organisms. (Effectively, an intersection)
  • 13.
    We then publishour OWL model of a Probable Interactor on the Web In a local data-file we provide the protein we are interested in, and the two species we wish to use in our comparison taxon:9606 a i:OrganismOfInterest . # human uniprot:Q9UK53 a i:ProteinOfInterest . # ING1 taxon:4932 a i:ModelOrganism1 . # yeast taxon:7227 a i:ModelOrganism2 . # fly These four lines represent all of the data provided to the query I am about to show you...
  • 14.
    This is thequestion we ask: PREFIX i: <http://sadiframework.org/ontologies/InteractingProteins.owl#> SELECT ?protein FROM <file:/local/workflow.input.n3> WHERE { ?protein a i:ProbableInteractor . } The reference to our OWL model of the answer
  • 15.
    Our system thenderives (and executes) the following workflow automatically These are different Web services! ...selected at run-time based on the same model
  • 16.
    There are twovery cool things about what you just saw...
  • 17.
    There are twovery cool things about what you just saw... The system was able to create a workflow based on an OWL model
  • 18.
    There are twovery cool things about what you just saw... The workflow it created (i.e. the services chosen) differed depending on context
  • 19.
    We got theanswer “simply” by designing a model of the answer!
  • 20.
    Start at thebeginning...
  • 21.
    Semantic Automated Discoveryand Integration A Semantic Web-focused Web Services specification Microsoft Research http://sadiframework.org
  • 22.
    Web Services vs. Semantic Web
  • 23.
    Web Services XML +XML Schema Semantic Web RDF + OWL
  • 24.
    Web Services POST ofSOAP Semantic Web GET of RDF
  • 25.
    Web Services No (rigorous)semantics Semantic Web Rich, flexible semantics
  • 26.
    Web Services & Semantic Web Fundamentally different Web technologies
  • 27.
    A design-practice for WebService provision on the Semantic Web
  • 28.
    100% standards-compliant with no“invented” standards
  • 29.
  • 30.
    Rules come from observations:
  • 31.
    SADI Observation #1: WebServices in Bioinformatics create implicit biological relationships between their input and output
  • 32.
  • 33.
    SADI Design Practice#1 Make the implicit explicit… A Web Service should create “triples” linking the input data to the output data thus explicitly describing the semantic relationship between them
  • 34.
    SADI Best Practice#1 This is what bioinformatics Web Services implicitly do anyway! Easy to implement this as a best-practice ...and makes SADI Services a source of Linked Data
  • 35.
    SADI Observation #2: HTTP GET and POST GET guarantees the response relates to the request URI in a very precise and predictable way POST does not…
  • 36.
    SADI Observation #2: HTTP GET and POST That’s why Web Services have a fundamentally different behaviour than the Semantic Web
  • 37.
    SADI Observation #2: GET and POST We can fix that! (without breaking any existing rules or standards!)
  • 38.
    SADI Best Practice#2 SUBJECT URI of the output graph (triples) is the same as SUBJECT URI of the input graph (triples) (the output is “about” the input... Now explicitly!)
  • 39.
    SADI Best Practice#2 GeneID GeneID rdf:type rdf:type BRCA1 BRCA1 hasDNASequence SeqRet AGCTTA...
  • 40.
    Consequence Web Services nowexhibit a very similar behavior to the Web itself POST “behaves like” GET
  • 41.
    SADI Interfaces Service Interfacescan be described by two OWL classes (this is 100% compatible with the SAWSDL standard)
  • 42.
    SADI Interfaces OWL Class#1: My Input Class
  • 43.
    SADI Interfaces OWL Class#2: My Output Class
  • 44.
    SADI Service Functionality Consumes OWL Individuals of Class #1 Returns OWL Individuals of Class #2 but the URI of those two GeneID GeneID individuals is the same; they are rdf:type rdf:type BRCA1 BRCA1 the same individual, just now a hasDNASequence member of a new class. SeqRet AGCTTA...
  • 45.
    In practice, ofcourse, you don’t return the input data Strip it and add the new data provided by the Service But since the output is still “rooted” in the input node, Input and output are easily merged client-side (just concatenate the output with the input)
  • 46.
    Service Description INPUT OWL Class NamedIndividual: things with a “name” property from “foaf” ontology OUTPUT OWL Class GreetedIndividual: things with a “greeting” property from “hello” ontology POST http://example.org/myservice person:1 foaf:name rdf:type hello:Named Guy Incognito Individual person:1 hello:greeting rdf:type hello:Greeted Hello, Guy Incognito! Individual
  • 47.
    How do wediscover services? Input and output are about the same “thing” Therefore, to describe what a service does simply compare (“diff”) the Input and Output OWL classes This is not prescriptive! Just how we use it
  • 48.
    Service Description INPUT OWL Class NamedIndividual: things with a “name” property from “foaf” ontology OUTPUT OWL Class GreetedIndividual: things with a “greeting” property from “hello” ontology The service provides POST http://example.org/myservice a “greeting” to a Named Individual person:1 based on its “name” foaf:name rdf:type hello:Named Guy Incognito Individual person:1 hello:greeting rdf:type hello:Greeted Hello, Guy Incognito! Individual
  • 49.
    Service Discovery Index all of the properties added by all of the services under all circumstances
  • 50.
    Real-world Example Input Data: BRCA1 rdf:type Gene ID Output Data: BRCA1 hasDNASequence AGCTTAGCCA… Registry Index: Service provides “hasDNASequence” property to Gene IDs
  • 51.
    Service Discovery Simply searchfor the property of interest based on the data in-hand
  • 52.
    e.g. The question: “whatis the DNA sequence of BRCA1?” Discover a SADI Web Service that generates the DNA Sequence property for gene identifiers
  • 53.
    DEMO Knowledge Explorer Plug-in For more information about the Knowledge Explorer surf to: http://io-informatics.com
  • 55.
    SADI has justfilled-in “Encodes” property for the three genes from the output of discovered Web Service(s)
  • 57.
    Discover services thatprovide the hasGOTerm property for Protein Sequence datatype
  • 59.
    This kind of“Web Service Surfing” is very intuitive for the Biologist! No need to describe the algorithm or the database just describe the properties that will be added
  • 60.
    Semantic Health AndResearch Environment SHARE answers arbitrary SPARQL queries by finding and executing SADI Services
  • 61.
    Example #1 What is the phenotype of every allele of the Antirrhinum majus DEFICIENS gene SELECT ?allele ?image ?desc WHERE { locus:DEF genetics:hasVariant ?allele . ?allele info:visualizedByImage ?image . ?image info:hasDescription ?desc }
  • 62.
    Example #1 What is the phenotype of every allele of the Antirrhinum majus DEFICIENS gene SELECT ?allele ?image ?desc WHERE { locus:DEF genetics:hasVariant ?allele . ?allele info:visualizedByImage ?image . ?image info:hasDescription ?desc } Note that there is no “FROM” clause! We don’t tell it where it should get the information, The machine has to figure that out by itself...
  • 63.
    Enter that queryinto SHARE
  • 64.
  • 65.
    ...and in afew seconds you get your answer. Based on predicates in your query, SHARE utilized SADI to automatically discover the resources required to answer your question.
  • 66.
    Because it isthe Semantic Web The query results are live hyperlinks to the respective Database or images
  • 67.
    Importantly We posed, andanswered a complex database query WITHOUT A DATABASE
  • 68.
    (in fact, thedata didn’t even have to exist... as I’ll now show you)
  • 69.
    Example #2 Show me the latest Blood Urea Nitrogen and Creatinine levels of patients who appear to be rejecting their transplants SELECT ?patient ?bun ?creat FROM <http://sadiframework.org/ontologies/patients.rdf> WHERE { ?patient rdf:type patient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat . }
  • 70.
    Likely Rejecter: A patientwho has creatinine levels that are increasing over time - - Wilkinson “MD”
  • 71.
    Likely Rejecter: Our databasecontains various blood chemistry measurements at various time-points
  • 72.
    Likely Rejecter: …butthere is no “likely rejecter” column or table in our database…
  • 73.
    SHARE determines by itself the need to do a Linear Regression analysis over Creatinine blood chemistry measurements
  • 74.
    SHARE determines by itself how and where that analysis can be done and does it
  • 75.
    The SHARE systemutilizes Semantics (via SADI) to discover and access analytical services on the Web that do linear regression analysis
  • 76.
  • 77.
  • 78.
  • 79.
    Ontology Spectrum Thesauri Frames Selected “narrower (Properties) Logical Catalog/ term” Formal Constraints ID relation is-a (disjointness, inverse, …) Terms/ Informal Formal General Value Logical glossary is-a instance Restrs. constraints Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness. Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
  • 80.
    Ontology Spectrum BETTER!! Thesauri Frames Selected “narrower (Properties) Logical Catalog/ term” Formal Constraints ID relation is-a (disjointness, inverse, …) Terms/ Informal Formal General Value Logical glossary is-a instance Restrs. constraints Basic SADI/SHARE functionality
  • 81.
    Ontology Spectrum Because it fulfils XYZ Thesauri WHY? Frames Selected “narrower (Properties) Logical Catalog/ term” Formal Constraints ID relation is-a (disjointness, inverse, …) Terms/ Informal Formal General Value Logical glossary is-a instance Restrs. constraints Because I say so! Why is this data a member of this OWL Class? (and therefore valid input to the service)
  • 82.
    Discovery systems -flexible Selected Thesauri Logical Catalog/ “narrower term” Formal Frames Constraints (disjointness, ID (Properties) relation is-a inverse, …) Terms/ Informal Formal Value General Logical glossary is-a instance Restrs. constraints Categorization Systems – like library shelves, inflexible
  • 83.
    In the upperend of the Ontology Spectrum, if the data has the right properties It can be discovered to be a valid input to a Service regardless of how it was originally classified or which ontology was used for that classification
  • 84.
    ...and in thecontext of a SHARE query those individual properties may have been aggregated from many different places; The data becomes a valid input as properties aggregate
  • 85.
    In exactly thesame way that the OWL property restrictions of a SADI Input Class tell SHARE what properties a service requires as input The property restrictions of an OWL Class in the SPARQL query tell SHARE what properties it needs to retrieve to create members of that class
  • 86.
    Show me thelatest Blood Urea Nitrogen and Creatinine levels of patients who appear to be rejecting their transplants PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#> PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creat FROM <http://sadiframework.org/ontologies/patients.rdf> WHERE { ?patient rdf:type patient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat . }
  • 87.
    The definition ofa Likely Rejecter category is encoded in a machine-readable document written in the OWL Ontology language Basically: “the regression line over creatinine measurements should have an increasing slope”
  • 88.
    SHARE burrows downthrough the ontological definition to learn about what the properties of “regression models” are
  • 89.
    SHARE utilizes SADIto discover analytical services on the Web that do linear regression analysis then other services that can determine the “latest” measurements from a time-series
  • 90.
    Let’s go throughthat again from a different perspective
  • 91.
    OWL Classes thatinclude property restrictions can be “executed” as if they were workflows
  • 92.
    Ontology = Query= Workflow
  • 94.
  • 95.
    QUERY: SELECT images of mutations from genes in organism XXX that share homology to this gene in organism YYY
  • 96.
  • 97.
    As OWL Axioms HomologousMutantImage is owl:equivalentTo { Gene Q hasImage image P Gene Q hasSequence Sequence Q Gene R hasSequence Sequence R Sequence Q similarTo Sequence R Gene R = “my gene of interest” }
  • 98.
    Those axioms combineto create an OWL Class: Homologous Mutant Image
  • 99.
    QUERY: Retrieve owl:Homologous Mutant Image for gene XXX
  • 100.
    SHARE: Decomposes the owl:Homologous Mutant Image Class, discovers SADI services relevant to that class, and pipelines them together into a workflow
  • 103.
    “The user experimentsshowed that workflow re-use… is difficult for bioinformaticians.” - Gooderis, A. (2008) Ph.D. Thesis
  • 104.
    Under slightly differentcircumstances (e.g. studying the same phenomenon in a different organism) this workflow WILL NOT SOLVE THE PROBLEM It must be edited on a case-by-case basis This editing turns out to be extremely difficult, and can ~only be done manually
  • 105.
    This great ideawould be even better if workflows were a little bit easier to re-purpose!
  • 106.
    With SHARE, aworkflow is generated dynamically, based on all of the information presented in the query e.g. to create a new workflow simply specify a different organism ID in the query!
  • 107.
    With SHARE, aworkflow is generated dynamically, based on all of the information presented in the query Moreover, the workflow plan CHANGES dynamically based on service outputs!
  • 108.
    With SHARE, theontology is the workflow (not the same as “an ontology that describes workflows”) The ontology acts as an abstraction of a workflow, which is concretized at run- time based on circumstances of the query
  • 109.
    Works (best) withontologies in the “Frames+” part of the spectrum, if there are SADI services available Selected Thesauri Logical “narrower Constraints Formal Frames Catalog/ term” (disjointness, is-a (Properties) inverse, …) ID relation Terms/ Informal Formal Value General Logical glossary is-a instance Restrs. constraints
  • 110.
    As far aswe are aware, SHARE is the only system that exhibits this particular behaviour ...and IMO this is a pretty big deal...
  • 111.
    Gordon, P.M.K., Soliman,M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
  • 112.
    That experiment cannow be represented ONCE as an OWL Class It becomes concretized automatically for each individual researcher as a distinct workflow given any starting protein and any combination of comparator species
  • 113.
    This is farbeyond simply changing the parameters entered into a workflow...
  • 114.
    Moreover, the ideaof automated workflow “individuality” is quite interesting to us...
  • 115.
    This is anearly prototype of a Patient-driven Personalized Medicine Web interface
  • 120.
    Matching based onofficial name, compound name, brand name, trade name, or “common name” 
  • 121.
    Still needs somework... ??!?!?
  • 125.
    Why the alert? Link out to PubMed
  • 127.
    The SADI+SHARE workflowand reasoning was personalized to YOUR medical data
  • 128.
    In future iterations,we will enable the workflow to be further customized through “personalized” OWL Classes (e.g. Provided by your Clinician!!)
  • 129.
    These OWL Classesmight include information about the current trajectory of your treatment for a chronic disease, for example, such that what you read on the Web is placed in the context of your expert Clinical care...
  • 130.
    Frankly, I thinkit’s quite cool that “people” are creating and running personalized workflows at the touch of a button...
  • 131.
    ...as many ofyou know, that has been my dream since I started studying this problem a decade ago!
  • 132.
  • 133.
    An experiment... basedon a hypothesis
  • 134.
    An experiment... basedon a hypothesis now modeled in OWL ... ...
  • 135.
    Does this OWLClass represent a Hypothesis? I think it does! ... ...
  • 136.
    I believe thatwe will soon show using SADI + SHARE that we can model a non-trivial hypothetical biological scenario then evaluate if that hypothesis is supported or not based on whether the automatically-synthesized workflow returns any individuals that conform to the model.
  • 137.
    Ontology = Hypothesis= Query = Workflow [= Materials and Methods ] Most of your publication is done! All you need to do now is interpret the results! These can be automatically derived through provenance information during workflow execution
  • 138.
    Please join us! SADIand SHARE are Open-Source projects http://sadiframework.org
  • 139.
  • 140.
    University of BritishColumbia Luke McCarthy – Lead Dev. Edward Kawas Everything... SADI Service auto-generator Benjamin VanderValk Ian Wood SHARE & SADI & Experimental modeling & Experimental modeling project myHeath Button Soroush Samadian Cardiovascular data modeling and queries
  • 141.
    C-BRASS Collaborators atother sites U of New Brunswick Carleton University Dr. Chris Baker Dr. Michel Dumontier Alexandre Riazanov Marc-Alexandre Nolin Leonid Chepelev Steve Etlinger Nichaella Kieth Jose Cruz
  • 142.