Web Science 2.0

Conducting in silico research in the Web
    from hypothesis to publication
                      Mark Wilkinson




  Isaac Peral Senior Researcher in Biological Informatics
  Centro de Biotecnología y Genómica de Plantas, UPM, Madrid, Spain
  Adjunct Professor of Medical Genetics, University of British Columbia
  Vancouver, BC, Canada.
Context
  “While it took 2,300 years after the first
  report of angina for the condition to be
  commonly taught in medical curricula,
  modern discoveries are being
  disseminated at an increasingly rapid
  pace. Focusing on the last 150 years,
  the trend still appears to be linear,
  approaching the axis around 2025.”




The Healthcare Singularity and the Age of Semantic Medicine,
Michael Gillam, et al, The Fourth Paradigm: Data-Intensive
Scientific Discovery Tony Hey (Editor), 2009

Slide adapted with permission from Joanne Luciano, Presentation
at Health Web Science Workshop 2012, Evanston IL, USA
June 22, 2012.
“The Singularity”




   The X-intercept is where, the moment a discovery is made,
                it is immediately put into practice

     (not only medical practice, but any research endeavour...)

The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009
Slide Borrowed with Permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USA
June 22, 2012.
The technology required to achieve this
          does not yet exist
You
                                      Are
                                      Here




Scientific research would have to be conducted
              within a medium that
 immediately interpreted and disseminated
                  the results...
You
                                           Are
                                           Here




...in a form that immediately (actively!) affected the
                  research of others...
You
                                  Are
                                  Here




...without requiring them to be aware
       of these new discoveries.
I‟d like to show you how close
   we now are to this vision




   and how we got there
Web Science 2.0
We wanted to duplicate
a real, peer-reviewed, bioinformatics analysis


   simply by building a model in the Web
       describing what the answer
                  (if one existed)
               would look like
...the machine had to make
     every other decision
         on it‟s own
Brief Digression




 “in” the Web??
How we use
The Web today
By clicking here you cause this incredibly
powerful computational tool called The Web
 to retrieve a chunk of text and images that
    can only be understood by a human...
The Web is not a pigeon!
To achieve this vision


 We must learn how to
do research IN the Web


 Not OVER the Web
Resume Speed
We wanted to duplicate
a real, peer-reviewed, bioinformatics analysis


   simply by building a model in the Web
       describing what the answer
                  (if one existed)
               would look like
...the machine had to make
     every other decision
         on it‟s own
This is the study we chose:
Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies
data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
Original Study Simplified




Using what is known about interactions in fly & yeast


         predict new interactions with your
             human protein of interest
Abstracted

Given a protein P in Species X

   Find proteins similar to P in Species Y
   Retrieve interactors in Species Y
   Sequence-compare Y-interactors with Species X genome
        (1)  Keep only those with homologue in X


   Find proteins similar to P in Species Z
   Retrieve interactors in Species Z
   Sequence-compare Z-interactors with (1)



              Putative interactors in Species X
Modeling the answer...




                                           OWL




               Web Ontology Language (OWL) is the
                   language approved by the W3C
               for representing knowledge in the Web
Modeling the answer...


                   Note that every word in
                   this diagram is, in reality, a
                   URL (because it is OWL)

                   The model of the answer is
                   published in The Web
                   and borrows ideas from
                   other models published in
                   The Web
Modeling the answer...


               ProbableInteractor
                   is homologous to (
                       Potential Interactor from ModelOrganism1…)
                       and
                      Potential Interactor from ModelOrganism2…)




Probable Interactor is defined in OWL as a subclass of Potential Interactor
   that requires homologous pairs of interacting proteins to exist in both
                       comparator model organisms.

                        (Effectively, an intersection)
Publish our OWL model of a Probable Interactor


                  in the Web
Running the Web Science Experiment


                       In a local data-file

           provide the protein we are interested in

    and the two species we wish to use in our comparison




  taxon:9606       a      i:OrganismOfInterest . # human
  uniprot:Q9UK53   a      i:ProteinOfInterest . # ING1
  taxon:4932       a      i:ModelOrganism1 . # yeast
  taxon:7227       a      i:ModelOrganism2 . # fly
The tricky bit is...

  In the abstract, the
search for homology is
“generic” – ANY model
       organism.

But when the machine
   attempts to do the
experiment, it will have
   to use a variety of
resources because the
    answer requires        taxon:4932   a   i:ModelOrganism1 . # yeast
 information from two      taxon:7227   a   i:ModelOrganism2 . # fly
    different species
This is the question we ask:
                  (the query language here is SPARQL)




PREFIX i: <http://sadiframework.org/ontologies/InteractingProteins.owl#>

SELECT ?protein
FROM <file:/local/workflow.input.n3>
WHERE {

     ?protein        a        i:ProbableInteractor .


}



                         The reference (URL) to our OWL model of the answer
Our system then derives (and executes) the following workflow automatically




                                                  These are different
                                                  Web services!

                                                  ...selected at run-time
                                                  based on the same model
There are three very cool things about what you just saw...
There are three very cool things about what you just saw...




              The system was able to
            create a workflow based on
             an OWL model (ontology)
There are three very cool things about what you just saw...




          The system was able to create a
            COMPUTATIONAL workflow
          based on a BIOLOGICAL model
There are three very cool things about what you just saw...




               The workflow it created
              (i.e. the services chosen)
                 differed depending on
                         context
We got the answer

“simply” by designing a model of the answer!
How did we do that?
Two novel technologies

developed by my laboratory members

       over the past 8 years
A “Smart” Biomedical Resource Representation System
A Web application that answers
    SPARQL-DL queries

      Query-answering
     Enhanced by SADI
Demo #1
Imagine a “virtual database”


      all of the data
   from all databases
              +
          result of
 every conceivable analysis
How can we query that database?
What is the phenotype of every allele of the
          Antirrhinum majus DEFICIENS gene




SELECT ?allele    ?image     ?desc

WHERE {
      locus:DEF            genetics:hasVariant      ?allele .
        ?allele            info:visualizedByImage   ?image .
         ?image            info:hasDescription      ?desc
}
What is the phenotype of every allele of the
          Antirrhinum majus DEFICIENS gene




SELECT ?allele    ?image     ?desc

WHERE {
      locus:DEF            genetics:hasVariant            ?allele .
        ?allele            info:visualizedByImage         ?image .
         ?image            info:hasDescription            ?desc
}


                       Note that there is no “FROM” clause!
                       We don‟t tell it where it should get the information,
                       The machine has to figure that out by itself...
Enter that query into
      SHARE
Click “Submit”...
...and in a few seconds you get your answer.
The query results are live hyperlinks
to the respective Database or images
Neither SADI nor SHARE

  know anything about

plant biology or genetics
What pathways does UniProt protein P47989 belong to?

PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>
PREFIX ont: <http://ontology.dumontierlab.com/>
PREFIX uniprot: <http://lsrn.org/UniProt:>
SELECT ?gene ?pathway
WHERE {
        uniprot:P47989   pred:isEncodedBy    ?gene .
        ?gene            ont:isParticipantIn ?pathway .
}
What pathways does UniProt protein P47989 belong to?

PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>
PREFIX ont: <http://ontology.dumontierlab.com/>
PREFIX uniprot: <http://lsrn.org/UniProt:>
SELECT ?gene ?pathway
WHERE {
        uniprot:P47989   pred:isEncodedBy    ?gene .
        ?gene            ont:isParticipantIn ?pathway .
}
What pathways does UniProt protein P47989 belong to?

PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>
PREFIX ont: <http://ontology.dumontierlab.com/>
PREFIX uniprot: <http://lsrn.org/UniProt:>
SELECT ?gene ?pathway
WHERE {
        uniprot:P47989   pred:isEncodedBy    ?gene .
        ?gene            ont:isParticipantIn ?pathway .
}



               Note again that there is no “From” clause…

              I have not told SHARE where to look for the
                 answer, I am simply asking my question
Enter that query into
      SHARE
Two different
Two different   providers of
providers of    pathway
gene            information
information     (KEGG and
(KEGG &         GO);
NCBI);          were found &
were found &    accessed
accessed
The results are all links to the original data
Neither SADI nor SHARE

      know anything about

proteins or biochemical pathways
Recap
what we just saw

   We posed, and answered
~complex multi-database queries

WITHOUT A DATA WAREHOUSE
Demo #2
An example from the Clinical domain
Show me the latest Blood Urea Nitrogen and Creatinine levels
    of patients who appear to be rejecting their transplants



PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#>
PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#>
SELECT ?patient ?bun ?creat
FROM <http://sadiframework.org/ontologies/patients.rdf>
WHERE {
        ?patient rdf:type           patient:LikelyRejecter .
        ?patient l:latestBUN        ?bun .
        ?patient l:latestCreatinine ?creat .
}
Show me the latest Blood Urea Nitrogen (BUN) and
        Creatinine levels of patients who appear to be
                  rejecting their transplants


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#>
PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#>
SELECT ?patient ?bun ?creat
FROM <http://sadiframework.org/ontologies/patients.rdf>
WHERE {
        ?patient rdf:type           patient:LikelyRejecter .
        ?patient l:latestBUN        ?bun .
        ?patient l:latestCreatinine ?creat .
}
Likely Rejecter:

A patient who has creatinine levels
   that are increasing over time


                 - - Mark D Wilkinson‟s definition
Likely Rejecter:

  …but there is no “likely rejecter”
 column or table in our database…
only blood chemistry measurements
        at various time-points
Likely Rejecter:

So the data required to answer this question
             DOESN‟T EXIST!
?
Enter that query into
      SHARE
Now…

Two “magical” events occur…
The machine decides

              by itself

         that it needs to do a
      Linear Regression analysis
on the blood creatinine measurements
   in order to answer your question
The machine decides

          by itself

how and where that analysis
       can be done

and does it automatically!
http://www.impactlab.net/2009/03/22/improve-your-brain-power/
The SHARE system utilizes SADI to discover
analytical services on the Web that do linear regression analysis
                and sends the data to be analysed
VOILA!
Neither SADI nor SHARE

     know anything about

blood chemistry, or mathematics
So how does the machine know
        what to do??
Ontologies
Ontologies explicitly define the kinds of
        things that (can) exist…

    …and what those things are “like”



                 i.e. what properties they have
     (color, weight, shape, texture, temperature, “state”)
       and what relationships they have to one another
  (inside-of, adjacent-to, part-of, binds-to, controls, inhibits,
                        degrades, etc.)
So we create ………….
  ontologies about biology
         and health

We* publish them on the Web




                 * We… or anybody! Anybody can publish an ontology!
My definition of a Likely Rejecter is encoded in
       a machine-readable document written in the OWL Ontology language

Basically:

  “the regression line over creatinine measurements should have an increasing slope”
Our ontology refers to other ontologies (possibly published by other people)
       to learn about what the properties of “regression models” are
           e.g. that regression models have slopes and intercepts
             and that slopes and intercepts have decimal values
SHARE examines the query


Looks on the Web for ontologies that describe the
  problem it is trying to solve, and “reads” them


 then uses that “knowledge” to figure out which
   data-sources and analytical tools it needs
              to answer the query
The way SHARE “interprets” data varies
        depending on the context of the query
  (i.e. which ontologies it reads – Mine? Yours?)


              and on what part of the query
     it is trying to answer at any given moment
(which ontological concept is relevant to that clause)
Data exhibits “late binding”
Late binding:

     “purpose and meaning”
           of the data is
       not determined until
    the moment it is required

a.k.a The “semantics” of the data
Benefit
  of late binding


  Data is amenable to
constant re-interpretation
Example?

Blood Creatinine measurements

 were not dictated to be (only)

Blood Creatinine measurements
Example?

   The data had the „qualities/properties‟ that

       allowed one machine to interpret

that they were Blood Creatinine measurements

(e.g. to determine which patients were rejecting)
Example?

But the data also had the „qualities/properties‟ that

 allowed another machine to interpret them as

            Simple X/Y coordinate data

   (e.g. the Linear Regression calculation tool)
Benefit
  of late binding


  Data is amenable to
constant re-interpretation
http://www.flickr.com/people/faernworks/
And that brings us to...
Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies
data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
Our system then derives (and executes) the following workflow automatically
The analytical tools chosen for that
   workflow changed depending on

                context

even though the biological model driving
      their selection was the same
i.e.

The published model is re-usable
i.e.

      The published model is re-usable

In different contexts... By different researchers
and because the model IS the experiment


  the published EXPERIMENT is re-usable!!




simply point the same query at your own dataset...
The publication is an
executable document!
Every component of the model

      Every component of the input data

     Every component of the output data

                   is a URL


Therefore the question, the experiment, and the
answer, are immediately published IN the Web
Every component of the model

      Every component of the input data

     Every component of the output data

                  is a URL


The answer, and the knowledge derived from it,
  is immediately available to search engines
and moreover, can affect the outcome of other
         Web Science experiments
You
Are Now
 Here!!!
Final thoughts
An experiment... based on a hypothesis
An experiment... based on a hypothesis

        now modeled in OWL




               ...
               ...
Does this OWL Class represent the Hypothesis?

                I think it does!




                   ...
                   ...
We modeled the answer...
...but the answer was hypothetical
Change the way we think of “hypotheses”
In Web Science 2.0


Model what the world would “look like”
    if your hypothesis were true


   Then ask “is there any data that
          fits that model?”
Like the blind men examining an elephant

Seemingly different aspects of Web Science research
   are embodied in/derived from the same “thing”

                 The OWL Model
Our vision of Web Science 2.0

Hypothesis                                                     Query




Workflow



  Ontology                                                  Result
                      Materials &
                       Methods
                                    These can be automatically derived through
                                    provenance information during workflow execution
Please join us!

SADI and SHARE are Open-Source projects

       http://sadiframework.org
My New Home!
University of British Columbia


Luke McCarthy – Lead Dev.                  Edward Kawas
Everything...                              SADI Service auto-generator



Benjamin VanderValk                        Ian Wood
SHARE & SADI & Experimental modeling &     Experimental modeling project
myHeath Button




Soroush Samadian
Cardiovascular data modeling and queries
C-BRASS Collaborators at other sites

U of New Brunswick      Carleton University

Dr. Chris Baker         Dr. Michel Dumontier
Alexandre Riazanov      Marc-Alexandre Nolin
                        Leonid Chepelev
                        Steve Etlinger
                        Nichaella Kieth
                        Jose Cruz
Microsoft Research

Web Science, SADI, and the Singularity

  • 1.
    Web Science 2.0 Conductingin silico research in the Web from hypothesis to publication Mark Wilkinson Isaac Peral Senior Researcher in Biological Informatics Centro de Biotecnología y Genómica de Plantas, UPM, Madrid, Spain Adjunct Professor of Medical Genetics, University of British Columbia Vancouver, BC, Canada.
  • 2.
    Context “Whileit took 2,300 years after the first report of angina for the condition to be commonly taught in medical curricula, modern discoveries are being disseminated at an increasingly rapid pace. Focusing on the last 150 years, the trend still appears to be linear, approaching the axis around 2025.” The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009 Slide adapted with permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USA June 22, 2012.
  • 3.
    “The Singularity” The X-intercept is where, the moment a discovery is made, it is immediately put into practice (not only medical practice, but any research endeavour...) The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009 Slide Borrowed with Permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USA June 22, 2012.
  • 4.
    The technology requiredto achieve this does not yet exist
  • 5.
    You Are Here Scientific research would have to be conducted within a medium that immediately interpreted and disseminated the results...
  • 6.
    You Are Here ...in a form that immediately (actively!) affected the research of others...
  • 7.
    You Are Here ...without requiring them to be aware of these new discoveries.
  • 8.
    I‟d like toshow you how close we now are to this vision and how we got there
  • 9.
  • 10.
    We wanted toduplicate a real, peer-reviewed, bioinformatics analysis simply by building a model in the Web describing what the answer (if one existed) would look like
  • 11.
    ...the machine hadto make every other decision on it‟s own
  • 12.
  • 13.
    How we use TheWeb today
  • 15.
    By clicking hereyou cause this incredibly powerful computational tool called The Web to retrieve a chunk of text and images that can only be understood by a human...
  • 16.
    The Web isnot a pigeon!
  • 17.
    To achieve thisvision We must learn how to do research IN the Web Not OVER the Web
  • 18.
  • 19.
    We wanted toduplicate a real, peer-reviewed, bioinformatics analysis simply by building a model in the Web describing what the answer (if one existed) would look like
  • 20.
    ...the machine hadto make every other decision on it‟s own
  • 21.
    This is thestudy we chose:
  • 22.
    Gordon, P.M.K., Soliman,M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
  • 23.
    Original Study Simplified Usingwhat is known about interactions in fly & yeast predict new interactions with your human protein of interest
  • 24.
    Abstracted Given a proteinP in Species X Find proteins similar to P in Species Y Retrieve interactors in Species Y Sequence-compare Y-interactors with Species X genome (1)  Keep only those with homologue in X Find proteins similar to P in Species Z Retrieve interactors in Species Z Sequence-compare Z-interactors with (1)  Putative interactors in Species X
  • 25.
    Modeling the answer... OWL Web Ontology Language (OWL) is the language approved by the W3C for representing knowledge in the Web
  • 26.
    Modeling the answer... Note that every word in this diagram is, in reality, a URL (because it is OWL) The model of the answer is published in The Web and borrows ideas from other models published in The Web
  • 27.
    Modeling the answer... ProbableInteractor is homologous to ( Potential Interactor from ModelOrganism1…) and Potential Interactor from ModelOrganism2…) Probable Interactor is defined in OWL as a subclass of Potential Interactor that requires homologous pairs of interacting proteins to exist in both comparator model organisms. (Effectively, an intersection)
  • 28.
    Publish our OWLmodel of a Probable Interactor in the Web
  • 29.
    Running the WebScience Experiment In a local data-file provide the protein we are interested in and the two species we wish to use in our comparison taxon:9606 a i:OrganismOfInterest . # human uniprot:Q9UK53 a i:ProteinOfInterest . # ING1 taxon:4932 a i:ModelOrganism1 . # yeast taxon:7227 a i:ModelOrganism2 . # fly
  • 30.
    The tricky bitis... In the abstract, the search for homology is “generic” – ANY model organism. But when the machine attempts to do the experiment, it will have to use a variety of resources because the answer requires taxon:4932 a i:ModelOrganism1 . # yeast information from two taxon:7227 a i:ModelOrganism2 . # fly different species
  • 31.
    This is thequestion we ask: (the query language here is SPARQL) PREFIX i: <http://sadiframework.org/ontologies/InteractingProteins.owl#> SELECT ?protein FROM <file:/local/workflow.input.n3> WHERE { ?protein a i:ProbableInteractor . } The reference (URL) to our OWL model of the answer
  • 32.
    Our system thenderives (and executes) the following workflow automatically These are different Web services! ...selected at run-time based on the same model
  • 34.
    There are threevery cool things about what you just saw...
  • 35.
    There are threevery cool things about what you just saw... The system was able to create a workflow based on an OWL model (ontology)
  • 36.
    There are threevery cool things about what you just saw... The system was able to create a COMPUTATIONAL workflow based on a BIOLOGICAL model
  • 37.
    There are threevery cool things about what you just saw... The workflow it created (i.e. the services chosen) differed depending on context
  • 38.
    We got theanswer “simply” by designing a model of the answer!
  • 39.
    How did wedo that?
  • 40.
    Two novel technologies developedby my laboratory members over the past 8 years
  • 41.
    A “Smart” BiomedicalResource Representation System
  • 42.
    A Web applicationthat answers SPARQL-DL queries Query-answering Enhanced by SADI
  • 43.
  • 44.
    Imagine a “virtualdatabase” all of the data from all databases + result of every conceivable analysis
  • 45.
    How can wequery that database?
  • 47.
    What is thephenotype of every allele of the Antirrhinum majus DEFICIENS gene SELECT ?allele ?image ?desc WHERE { locus:DEF genetics:hasVariant ?allele . ?allele info:visualizedByImage ?image . ?image info:hasDescription ?desc }
  • 48.
    What is thephenotype of every allele of the Antirrhinum majus DEFICIENS gene SELECT ?allele ?image ?desc WHERE { locus:DEF genetics:hasVariant ?allele . ?allele info:visualizedByImage ?image . ?image info:hasDescription ?desc } Note that there is no “FROM” clause! We don‟t tell it where it should get the information, The machine has to figure that out by itself...
  • 49.
    Enter that queryinto SHARE
  • 50.
  • 51.
    ...and in afew seconds you get your answer.
  • 52.
    The query resultsare live hyperlinks to the respective Database or images
  • 53.
    Neither SADI norSHARE know anything about plant biology or genetics
  • 54.
    What pathways doesUniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway WHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway . }
  • 55.
    What pathways doesUniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway WHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway . }
  • 56.
    What pathways doesUniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway WHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway . } Note again that there is no “From” clause… I have not told SHARE where to look for the answer, I am simply asking my question
  • 57.
    Enter that queryinto SHARE
  • 60.
    Two different Two different providers of providers of pathway gene information information (KEGG and (KEGG & GO); NCBI); were found & were found & accessed accessed
  • 61.
    The results areall links to the original data
  • 62.
    Neither SADI norSHARE know anything about proteins or biochemical pathways
  • 63.
    Recap what we justsaw We posed, and answered ~complex multi-database queries WITHOUT A DATA WAREHOUSE
  • 64.
    Demo #2 An examplefrom the Clinical domain
  • 65.
    Show me thelatest Blood Urea Nitrogen and Creatinine levels of patients who appear to be rejecting their transplants PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#> PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creat FROM <http://sadiframework.org/ontologies/patients.rdf> WHERE { ?patient rdf:type patient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat . }
  • 66.
    Show me thelatest Blood Urea Nitrogen (BUN) and Creatinine levels of patients who appear to be rejecting their transplants PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#> PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creat FROM <http://sadiframework.org/ontologies/patients.rdf> WHERE { ?patient rdf:type patient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat . }
  • 67.
    Likely Rejecter: A patientwho has creatinine levels that are increasing over time - - Mark D Wilkinson‟s definition
  • 68.
    Likely Rejecter: …but there is no “likely rejecter” column or table in our database… only blood chemistry measurements at various time-points
  • 69.
    Likely Rejecter: So thedata required to answer this question DOESN‟T EXIST!
  • 70.
  • 71.
    Enter that queryinto SHARE
  • 72.
  • 73.
    The machine decides by itself that it needs to do a Linear Regression analysis on the blood creatinine measurements in order to answer your question
  • 74.
    The machine decides by itself how and where that analysis can be done and does it automatically!
  • 75.
  • 76.
    The SHARE systemutilizes SADI to discover analytical services on the Web that do linear regression analysis and sends the data to be analysed
  • 77.
  • 78.
    Neither SADI norSHARE know anything about blood chemistry, or mathematics
  • 79.
    So how doesthe machine know what to do??
  • 80.
  • 81.
    Ontologies explicitly definethe kinds of things that (can) exist… …and what those things are “like” i.e. what properties they have (color, weight, shape, texture, temperature, “state”) and what relationships they have to one another (inside-of, adjacent-to, part-of, binds-to, controls, inhibits, degrades, etc.)
  • 82.
    So we create…………. ontologies about biology and health We* publish them on the Web * We… or anybody! Anybody can publish an ontology!
  • 83.
    My definition ofa Likely Rejecter is encoded in a machine-readable document written in the OWL Ontology language Basically: “the regression line over creatinine measurements should have an increasing slope”
  • 84.
    Our ontology refersto other ontologies (possibly published by other people) to learn about what the properties of “regression models” are e.g. that regression models have slopes and intercepts and that slopes and intercepts have decimal values
  • 85.
    SHARE examines thequery Looks on the Web for ontologies that describe the problem it is trying to solve, and “reads” them then uses that “knowledge” to figure out which data-sources and analytical tools it needs to answer the query
  • 86.
    The way SHARE“interprets” data varies depending on the context of the query (i.e. which ontologies it reads – Mine? Yours?) and on what part of the query it is trying to answer at any given moment (which ontological concept is relevant to that clause)
  • 87.
  • 88.
    Late binding: “purpose and meaning” of the data is not determined until the moment it is required a.k.a The “semantics” of the data
  • 89.
    Benefit oflate binding Data is amenable to constant re-interpretation
  • 90.
    Example? Blood Creatinine measurements were not dictated to be (only) Blood Creatinine measurements
  • 91.
    Example? The data had the „qualities/properties‟ that allowed one machine to interpret that they were Blood Creatinine measurements (e.g. to determine which patients were rejecting)
  • 92.
    Example? But the dataalso had the „qualities/properties‟ that allowed another machine to interpret them as Simple X/Y coordinate data (e.g. the Linear Regression calculation tool)
  • 93.
    Benefit oflate binding Data is amenable to constant re-interpretation
  • 94.
  • 95.
  • 96.
    Gordon, P.M.K., Soliman,M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
  • 97.
    Our system thenderives (and executes) the following workflow automatically
  • 98.
    The analytical toolschosen for that workflow changed depending on context even though the biological model driving their selection was the same
  • 99.
  • 100.
    i.e. The published model is re-usable In different contexts... By different researchers
  • 101.
    and because themodel IS the experiment the published EXPERIMENT is re-usable!! simply point the same query at your own dataset...
  • 102.
    The publication isan executable document!
  • 103.
    Every component ofthe model Every component of the input data Every component of the output data is a URL Therefore the question, the experiment, and the answer, are immediately published IN the Web
  • 104.
    Every component ofthe model Every component of the input data Every component of the output data is a URL The answer, and the knowledge derived from it, is immediately available to search engines and moreover, can affect the outcome of other Web Science experiments
  • 106.
  • 107.
  • 108.
    An experiment... basedon a hypothesis
  • 109.
    An experiment... basedon a hypothesis now modeled in OWL ... ...
  • 110.
    Does this OWLClass represent the Hypothesis? I think it does! ... ...
  • 111.
    We modeled theanswer... ...but the answer was hypothetical
  • 112.
    Change the waywe think of “hypotheses”
  • 113.
    In Web Science2.0 Model what the world would “look like” if your hypothesis were true Then ask “is there any data that fits that model?”
  • 114.
    Like the blindmen examining an elephant Seemingly different aspects of Web Science research are embodied in/derived from the same “thing” The OWL Model
  • 115.
    Our vision ofWeb Science 2.0 Hypothesis Query Workflow Ontology Result Materials & Methods These can be automatically derived through provenance information during workflow execution
  • 117.
    Please join us! SADIand SHARE are Open-Source projects http://sadiframework.org
  • 118.
  • 119.
    University of BritishColumbia Luke McCarthy – Lead Dev. Edward Kawas Everything... SADI Service auto-generator Benjamin VanderValk Ian Wood SHARE & SADI & Experimental modeling & Experimental modeling project myHeath Button Soroush Samadian Cardiovascular data modeling and queries
  • 120.
    C-BRASS Collaborators atother sites U of New Brunswick Carleton University Dr. Chris Baker Dr. Michel Dumontier Alexandre Riazanov Marc-Alexandre Nolin Leonid Chepelev Steve Etlinger Nichaella Kieth Jose Cruz
  • 121.