Using OWL Domain Models as       Abstract Workflow Models                   Or...Conducting in silico research in the Web ...
Context  “While it took 2,300 years after the first  report of angina for the condition to be  commonly taught in medical ...
“The Singularity”              The X-intercept is where, the moment a discovery is                   made, it is immediate...
The technology required     to achieve this   does not yet exist
You                                      Are                                      HereScientific research would have to be...
You                                           Are                                           Here...in a form that immediat...
You                                  Are                                  Here...without requiring them to be aware       ...
To achieve this vision We must learn how todo research IN the Web Not OVER the Web
How we usethe Web today
To achieve this vision We must learn how todo research IN the Web Not OVER the Web
I’d like to show you how close   we now are to this vision   and how we got there
Web Science 2.0
We wanted to duplicatea real, peer-reviewed, bioinformatics analysis   simply by building a model in the Web       describ...
...the machine had to make     every other decision         on it’s own
This is the study we chose:
Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspeciesdata mining to predict novel I...
Original Study SimplifiedUsing what is known about interactions in fly & yeast         predict new interactions with your ...
AbstractedGiven a protein P in Species X   Find proteins similar to P in Species Y   Retrieve interactors in Species Y   S...
Modeling the answer...                                           OWL               Web Ontology Language (OWL) is the     ...
Modeling the answer...                   Note that every word in                   this diagram is, in reality, a         ...
Modeling the answer...                   The model of a Potential                   Interactor is published in            ...
Modeling the answer...                   The model of a Potential                   Interactor is a network of            ...
Modeling the answer...               ProbableInteractor                   is homologous to (                       Potenti...
Publish our OWL model of a Probable Interactor                  in the Web
Running a Web Science 2.0          Experiment                     In a local data-file         provide the protein we are ...
The tricky bit is...  In the abstract, thesearch for homology is“generic” – ANY model       organism.But when the machine ...
This is the question we ask:                  (the query language here is SPARQL)PREFIX i: <http://sadiframework.org/ontol...
Our system then derives (and executes) the following workflow automatically                                               ...
There are three very cool things about what you just saw...
There are three very cool things about what you just saw...              The system was able to            create a workfl...
There are three very cool things about what you just saw...          The system was able to create a            COMPUTATIO...
There are three very cool things about what you just saw...                        The workflow it created                ...
We got the answer“simply” by designing a model of the answer!
How did we do that?
Design Pattern forWeb Services on the Semantic Web
A Web application that answers    SPARQL-DL queries      Query-answering     Enhanced by SADI
Demos of SADI and SHARE
What is the phenotype of every allele of the          Antirrhinum majus DEFICIENS geneSELECT ?allele    ?image     ?descWH...
What is the phenotype of every allele of the          Antirrhinum majus DEFICIENS geneSELECT ?allele    ?image     ?descWH...
Enter that query into      SHARE
Click “Submit”...
SHARE examines available SADI Web Services ...and in a few seconds you get your answer.
The query results are live hyperlinksto the respective Database or images        (the answer is IN the Web!)
What pathways does UniProt protein P47989 belong to?PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREF...
What pathways does UniProt protein P47989 belong to?PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREF...
What pathways does UniProt protein P47989 belong to?PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREF...
Enter that query into      SHARE
Two differentTwo different   providers ofproviders of    pathwaygene            informationinformation     (KEGG and(KEGG ...
The results are all links to the original data                                   (The answer is IN the Web!)
Show me the latest Blood Urea Nitrogen and Creatinine levels    of patients who appear to be rejecting their transplants  ...
Show me the latest Blood Urea Nitrogen and Creatinine levels    of patients who appear to be rejecting their transplants  ...
Likely Rejecter:A patient who has creatinine levels   that are increasing over time                 - - Mark D Wilkinson’s...
Likely Rejecter:  …but there is no “likely rejecter” column or table in our database…only blood chemistry measurements    ...
Likely Rejecter:So the data required to answer this question             DOESN’T EXIST!
?
Enter that query into      SHARE
SHARE “decomposes” the        Likely Rejector OWL classinto its constituent property restrictions
Each property restriction in the Classis matched with a SADI ServiceThe matched SADI Service cangenerate data that has tha...
SHARE chains these SADI servicesare into a workflow......the outputs from that workflow areInstances (OWL Individuals)of t...
For example… SHARE utilizes SADI to discoveranalytical services on the Web that do linear regression analysis;required for...
VOILA!
SHARE examines the OWL Class  Gathers, from the Web, the ontologies that are             referenced by that Class then use...
OWL
The way SHARE builds the workflow varies        depending on the context of the query(i.e. which data/ontologies it reads ...
And that brings us back to...
Web Science 2.0
Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspeciesdata mining to predict novel I...
derives and executes the following workflow automatically    using an OWL ontology that describes the biology
The analytical tools chosen for that workflow were determined based on               contexteven though the biological (on...
i.e.The published model is re-usable
i.e.      The published model is re-usableIn different contexts... by different researchers
Because the model IS the experiment   the published EXPERIMENT is re-usable!!Simply point the same query at your own datas...
Thescientific publication        is anexecutable document!
Every component of the modelEvery component of the input dataEvery component of the output data            is a URLTherefo...
Every component of the model        Every component of the input data       Every component of the output data            ...
YouAre Now Here!!!
Change the way we think of “hypotheses”
In Web Science 2.0Model what the world would “look like”    if your hypothesis were true   Then ask “is there any data tha...
Please join us!SADI and SHARE are Open-Source projects       http://sadiframework.org
My New Home!
University of British ColumbiaLuke McCarthy – Lead Dev.                  Edward KawasEverything...                        ...
C-BRASS Collaborators at other sitesU of New Brunswick      Carleton UniversityDr. Chris Baker         Dr. Michel Dumontie...
Microsoft Research
Web Science - ISoLA 2012
Web Science - ISoLA 2012
Web Science - ISoLA 2012
Web Science - ISoLA 2012
Upcoming SlideShare
Loading in...5
×

Web Science - ISoLA 2012

309

Published on

This is a brief version of earlier talks, but I think it might explain more emphatically what I think Web Science is, and why I believe it is realistic, and how SADI/SHARE technologies (or technologies like them) are important to achieve the vision

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
309
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • n 1499, when Portuguese explorer Vasco da Gama returned home after completing the first-ever sea voyage from Europe to India, he had less than half of his original crew with him— scurvy had claimed the lives of 100 of the 160 men. Through-out the Age of Discovery,1 scurvy was the leading cause of death among sailors. Ship captains typically planned for the death of as many as half of their crew during long voyages. A dietary cause for scurvy was suspected, but no one had proved it. More than a century later, on a voyage from England to India in 1601, Captain James Lancaster placed the crew of one of his four ships on a regi- men of three teaspoons of lemon juice a day. By the halfway point of the trip, almost 40% of the men (110 of 278) on three of the ships had died, while on the lemon-supplied ship, every man sur- vived [1]. The British navy responded to this discovery by repeat- ing the experiment—146 years later.In 1747, a British navy physician named James Lind treated sail- ors suffering from scurvy using six randomized approaches and demonstrated that citrus reversed the symptoms. The British navy responded, 48 years later, by enacting new dietary guidelines re- quiring citrus, which virtually eradicated scurvy from the British fleet overnight. The British Board of Trade adopted similar dietary practices for the merchant fleet in 1865, an additional 70 years later. The total time from Lancaster’s definitive demonstration of how to prevent scurvy to adoption across the British Empire was 264 years [2].The translation of medical discovery to practice has thankfully improved sub- stantially. But a 2003 report from the Institute of Medicine found that the lag be- tween significant discovery and adoption into routine patient care still averages 17 years [3, 4]. This delayed translation of knowledge to clinical care has negative effects on both the cost and the quality of patient care. A nationwide review of 439 quality indicators found that only half of adults receive the care recommended by U.S. national standards [5].
  • Web Science - ISoLA 2012

    1. 1. Using OWL Domain Models as Abstract Workflow Models Or...Conducting in silico research in the Web from hypothesis to publication Mark WilkinsonIsaac Peral Senior Researcher in Biological InformaticsCentro de Biotecnología y Genómica de Plantas, UPM, Madrid, SpainAdjunct Professor of Medical Genetics, University of British ColumbiaVancouver, BC, Canada.
    2. 2. Context “While it took 2,300 years after the first report of angina for the condition to be commonly taught in medical curricula, modern discoveries are being disseminated at an increasingly rapid pace. Focusing on the last 150 years, the trend still appears to be linear, approaching the axis around 2025.”The Healthcare Singularity and the Age of SemanticMedicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009Slide adapted with permission from Joanne Luciano, Presentationat Health Web Science Workshop 2012, Evanston IL, USAJune 22, 2012.
    3. 3. “The Singularity” The X-intercept is where, the moment a discovery is made, it is immediately put into practice (not only medical practice, but any research endeavour...)The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009Slide Borrowed with Permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USAJune 22, 2012.
    4. 4. The technology required to achieve this does not yet exist
    5. 5. You Are HereScientific research would have to be conducted within a medium that immediately interpreted and disseminated the results...
    6. 6. You Are Here...in a form that immediately (actively!) affected the research of others...
    7. 7. You Are Here...without requiring them to be aware of these new discoveries.
    8. 8. To achieve this vision We must learn how todo research IN the Web Not OVER the Web
    9. 9. How we usethe Web today
    10. 10. To achieve this vision We must learn how todo research IN the Web Not OVER the Web
    11. 11. I’d like to show you how close we now are to this vision and how we got there
    12. 12. Web Science 2.0
    13. 13. We wanted to duplicatea real, peer-reviewed, bioinformatics analysis simply by building a model in the Web describing what the answer (if one existed) would look like
    14. 14. ...the machine had to make every other decision on it’s own
    15. 15. This is the study we chose:
    16. 16. Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspeciesdata mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
    17. 17. Original Study SimplifiedUsing what is known about interactions in fly & yeast predict new interactions with your human protein of interest
    18. 18. AbstractedGiven a protein P in Species X Find proteins similar to P in Species Y Retrieve interactors in Species Y Sequence-compare Y-interactors with Species X genome (1)  Keep only those with homologue in X Find proteins similar to P in Species Z Retrieve interactors in Species Z Sequence-compare Z-interactors with (1)  Putative interactors in Species X
    19. 19. Modeling the answer... OWL Web Ontology Language (OWL) is the language approved by the W3C for representing knowledge in the Web
    20. 20. Modeling the answer... Note that every word in this diagram is, in reality, a URL (because it is OWL)
    21. 21. Modeling the answer... The model of a Potential Interactor is published in The Web It utilizes concepts from other models published in The Web (ours and other’s) by referencing their URLs
    22. 22. Modeling the answer... The model of a Potential Interactor is a network of concepts distributed within the Web It will be affected by changes to those concepts We do not “own” all of those concepts!
    23. 23. Modeling the answer... ProbableInteractor is homologous to ( Potential Interactor from ModelOrganism1…) and Potential Interactor from ModelOrganism2…)Probable Interactor is defined in OWL as a subclass of Potential Interactor that requires homologous pairs of interacting proteins to exist in both comparator model organisms. (Effectively, an intersection)
    24. 24. Publish our OWL model of a Probable Interactor in the Web
    25. 25. Running a Web Science 2.0 Experiment In a local data-file provide the protein we are interested in and the two species we wish to use in our comparisontaxon:9606 a i:OrganismOfInterest . # humanuniprot:Q9UK53 a i:ProteinOfInterest . # ING1taxon:4932 a i:ModelOrganism1 . # yeasttaxon:7227 a i:ModelOrganism2 . # fly
    26. 26. The tricky bit is... In the abstract, thesearch for homology is“generic” – ANY model organism.But when the machine attempts to do theexperiment, it will haveto use several differentand specific resources because our question specifies two different taxon:4932 a i:ModelOrganism1 . # yeast species taxon:7227 a i:ModelOrganism2 . # fly
    27. 27. This is the question we ask: (the query language here is SPARQL)PREFIX i: <http://sadiframework.org/ontologies/InteractingProteins.owl#>SELECT ?proteinFROM <file:/local/workflow.input.n3>WHERE { ?protein a i:ProbableInteractor . } The reference (URL) to our OWL model of the answer
    28. 28. Our system then derives (and executes) the following workflow automatically These are different Web services! ...selected at run-time based on the same model
    29. 29. There are three very cool things about what you just saw...
    30. 30. There are three very cool things about what you just saw... The system was able to create a workflow based on an OWL model (ontology)
    31. 31. There are three very cool things about what you just saw... The system was able to create a COMPUTATIONAL workflow based on a BIOLOGICAL model
    32. 32. There are three very cool things about what you just saw... The workflow it created (i.e. the services chosen) differed depending on contexttaxon:4932 a i:ModelOrganism1 . # yeasttaxon:7227 a i:ModelOrganism2 . # fly
    33. 33. We got the answer“simply” by designing a model of the answer!
    34. 34. How did we do that?
    35. 35. Design Pattern forWeb Services on the Semantic Web
    36. 36. A Web application that answers SPARQL-DL queries Query-answering Enhanced by SADI
    37. 37. Demos of SADI and SHARE
    38. 38. What is the phenotype of every allele of the Antirrhinum majus DEFICIENS geneSELECT ?allele ?image ?descWHERE { locus:DEF genetics:hasVariant ?allele . ?allele info:visualizedByImage ?image . ?image info:hasDescription ?desc}
    39. 39. What is the phenotype of every allele of the Antirrhinum majus DEFICIENS geneSELECT ?allele ?image ?descWHERE { locus:DEF genetics:hasVariant ?allele . ?allele info:visualizedByImage ?image . ?image info:hasDescription ?desc} Note that there is no “FROM” clause! We don’t tell it where it should get the information, The machine has to figure that out by itself...
    40. 40. Enter that query into SHARE
    41. 41. Click “Submit”...
    42. 42. SHARE examines available SADI Web Services ...and in a few seconds you get your answer.
    43. 43. The query results are live hyperlinksto the respective Database or images (the answer is IN the Web!)
    44. 44. What pathways does UniProt protein P47989 belong to?PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathwayWHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway .}
    45. 45. What pathways does UniProt protein P47989 belong to?PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathwayWHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway .}
    46. 46. What pathways does UniProt protein P47989 belong to?PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathwayWHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway .} Note again that there is no “From” clause… I have not told SHARE where to look for the answer, I am simply asking my question
    47. 47. Enter that query into SHARE
    48. 48. Two differentTwo different providers ofproviders of pathwaygene informationinformation (KEGG and(KEGG & GO);NCBI); were found &were found & accessedaccessed
    49. 49. The results are all links to the original data (The answer is IN the Web!)
    50. 50. Show me the latest Blood Urea Nitrogen and Creatinine levels of patients who appear to be rejecting their transplants (I showed you this query in ISoLA 2010… sorry for repeating myself  )PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#>PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#>SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE { ?patient rdf:type patient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat .}
    51. 51. Show me the latest Blood Urea Nitrogen and Creatinine levels of patients who appear to be rejecting their transplants (I showed you this query in 2010… sorry for repeating myself!)PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#>PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#>SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE { ?patient rdf:type patient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat .}
    52. 52. Likely Rejecter:A patient who has creatinine levels that are increasing over time - - Mark D Wilkinson’s definition
    53. 53. Likely Rejecter: …but there is no “likely rejecter” column or table in our database…only blood chemistry measurements at various time-points
    54. 54. Likely Rejecter:So the data required to answer this question DOESN’T EXIST!
    55. 55. ?
    56. 56. Enter that query into SHARE
    57. 57. SHARE “decomposes” the Likely Rejector OWL classinto its constituent property restrictions
    58. 58. Each property restriction in the Classis matched with a SADI ServiceThe matched SADI Service cangenerate data that has that property
    59. 59. SHARE chains these SADI servicesare into a workflow......the outputs from that workflow areInstances (OWL Individuals)of the Likely Rejector OWL Class
    60. 60. For example… SHARE utilizes SADI to discoveranalytical services on the Web that do linear regression analysis;required for the “increasing over time” part of the Class definition
    61. 61. VOILA!
    62. 62. SHARE examines the OWL Class Gathers, from the Web, the ontologies that are referenced by that Class then uses those ontological properties to identify which data-sources and analytical tools it mustaccess to create data matching that Class definition
    63. 63. OWL
    64. 64. The way SHARE builds the workflow varies depending on the context of the query(i.e. which data/ontologies it reads – Mine? Yours?) and on what part of the query it is trying to answer at any given moment(which ontological concept is relevant to that clause)
    65. 65. And that brings us back to...
    66. 66. Web Science 2.0
    67. 67. Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspeciesdata mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
    68. 68. derives and executes the following workflow automatically using an OWL ontology that describes the biology
    69. 69. The analytical tools chosen for that workflow were determined based on contexteven though the biological (ontological) model driving their selection was the same
    70. 70. i.e.The published model is re-usable
    71. 71. i.e. The published model is re-usableIn different contexts... by different researchers
    72. 72. Because the model IS the experiment the published EXPERIMENT is re-usable!!Simply point the same query at your own dataset...
    73. 73. Thescientific publication is anexecutable document!
    74. 74. Every component of the modelEvery component of the input dataEvery component of the output data is a URLTherefore the model, the question, the experiment, and the results are inherently IN the Web
    75. 75. Every component of the model Every component of the input data Every component of the output data is a URL The answer, and the knowledge derived from it, is immediately available to Web search enginesand moreover, can instantly affect the outcome of other Web Science experiments
    76. 76. YouAre Now Here!!!
    77. 77. Change the way we think of “hypotheses”
    78. 78. In Web Science 2.0Model what the world would “look like” if your hypothesis were true Then ask “is there any data that fits that model?”
    79. 79. Please join us!SADI and SHARE are Open-Source projects http://sadiframework.org
    80. 80. My New Home!
    81. 81. University of British ColumbiaLuke McCarthy – Lead Dev. Edward KawasEverything... SADI Service auto-generatorBenjamin VanderValk Ian WoodSHARE & SADI & Experimental modeling & Experimental modeling projectmyHeath ButtonSoroush SamadianCardiovascular data modeling and queries
    82. 82. C-BRASS Collaborators at other sitesU of New Brunswick Carleton UniversityDr. Chris Baker Dr. Michel DumontierAlexandre Riazanov Marc-Alexandre Nolin Leonid Chepelev Steve Etlinger Nichaella Kieth Jose Cruz
    83. 83. Microsoft Research
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×