Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Web Science - ISoLA 2012

597 views

Published on

This is a brief version of earlier talks, but I think it might explain more emphatically what I think Web Science is, and why I believe it is realistic, and how SADI/SHARE technologies (or technologies like them) are important to achieve the vision

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Web Science - ISoLA 2012

  1. 1. Using OWL Domain Models as Abstract Workflow Models Or...Conducting in silico research in the Web from hypothesis to publication Mark WilkinsonIsaac Peral Senior Researcher in Biological InformaticsCentro de Biotecnología y Genómica de Plantas, UPM, Madrid, SpainAdjunct Professor of Medical Genetics, University of British ColumbiaVancouver, BC, Canada.
  2. 2. Context “While it took 2,300 years after the first report of angina for the condition to be commonly taught in medical curricula, modern discoveries are being disseminated at an increasingly rapid pace. Focusing on the last 150 years, the trend still appears to be linear, approaching the axis around 2025.”The Healthcare Singularity and the Age of SemanticMedicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009Slide adapted with permission from Joanne Luciano, Presentationat Health Web Science Workshop 2012, Evanston IL, USAJune 22, 2012.
  3. 3. “The Singularity” The X-intercept is where, the moment a discovery is made, it is immediately put into practice (not only medical practice, but any research endeavour...)The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009Slide Borrowed with Permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USAJune 22, 2012.
  4. 4. The technology required to achieve this does not yet exist
  5. 5. You Are HereScientific research would have to be conducted within a medium that immediately interpreted and disseminated the results...
  6. 6. You Are Here...in a form that immediately (actively!) affected the research of others...
  7. 7. You Are Here...without requiring them to be aware of these new discoveries.
  8. 8. To achieve this vision We must learn how todo research IN the Web Not OVER the Web
  9. 9. How we usethe Web today
  10. 10. To achieve this vision We must learn how todo research IN the Web Not OVER the Web
  11. 11. I’d like to show you how close we now are to this vision and how we got there
  12. 12. Web Science 2.0
  13. 13. We wanted to duplicatea real, peer-reviewed, bioinformatics analysis simply by building a model in the Web describing what the answer (if one existed) would look like
  14. 14. ...the machine had to make every other decision on it’s own
  15. 15. This is the study we chose:
  16. 16. Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspeciesdata mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
  17. 17. Original Study SimplifiedUsing what is known about interactions in fly & yeast predict new interactions with your human protein of interest
  18. 18. AbstractedGiven a protein P in Species X Find proteins similar to P in Species Y Retrieve interactors in Species Y Sequence-compare Y-interactors with Species X genome (1)  Keep only those with homologue in X Find proteins similar to P in Species Z Retrieve interactors in Species Z Sequence-compare Z-interactors with (1)  Putative interactors in Species X
  19. 19. Modeling the answer... OWL Web Ontology Language (OWL) is the language approved by the W3C for representing knowledge in the Web
  20. 20. Modeling the answer... Note that every word in this diagram is, in reality, a URL (because it is OWL)
  21. 21. Modeling the answer... The model of a Potential Interactor is published in The Web It utilizes concepts from other models published in The Web (ours and other’s) by referencing their URLs
  22. 22. Modeling the answer... The model of a Potential Interactor is a network of concepts distributed within the Web It will be affected by changes to those concepts We do not “own” all of those concepts!
  23. 23. Modeling the answer... ProbableInteractor is homologous to ( Potential Interactor from ModelOrganism1…) and Potential Interactor from ModelOrganism2…)Probable Interactor is defined in OWL as a subclass of Potential Interactor that requires homologous pairs of interacting proteins to exist in both comparator model organisms. (Effectively, an intersection)
  24. 24. Publish our OWL model of a Probable Interactor in the Web
  25. 25. Running a Web Science 2.0 Experiment In a local data-file provide the protein we are interested in and the two species we wish to use in our comparisontaxon:9606 a i:OrganismOfInterest . # humanuniprot:Q9UK53 a i:ProteinOfInterest . # ING1taxon:4932 a i:ModelOrganism1 . # yeasttaxon:7227 a i:ModelOrganism2 . # fly
  26. 26. The tricky bit is... In the abstract, thesearch for homology is“generic” – ANY model organism.But when the machine attempts to do theexperiment, it will haveto use several differentand specific resources because our question specifies two different taxon:4932 a i:ModelOrganism1 . # yeast species taxon:7227 a i:ModelOrganism2 . # fly
  27. 27. This is the question we ask: (the query language here is SPARQL)PREFIX i: <http://sadiframework.org/ontologies/InteractingProteins.owl#>SELECT ?proteinFROM <file:/local/workflow.input.n3>WHERE { ?protein a i:ProbableInteractor . } The reference (URL) to our OWL model of the answer
  28. 28. Our system then derives (and executes) the following workflow automatically These are different Web services! ...selected at run-time based on the same model
  29. 29. There are three very cool things about what you just saw...
  30. 30. There are three very cool things about what you just saw... The system was able to create a workflow based on an OWL model (ontology)
  31. 31. There are three very cool things about what you just saw... The system was able to create a COMPUTATIONAL workflow based on a BIOLOGICAL model
  32. 32. There are three very cool things about what you just saw... The workflow it created (i.e. the services chosen) differed depending on contexttaxon:4932 a i:ModelOrganism1 . # yeasttaxon:7227 a i:ModelOrganism2 . # fly
  33. 33. We got the answer“simply” by designing a model of the answer!
  34. 34. How did we do that?
  35. 35. Design Pattern forWeb Services on the Semantic Web
  36. 36. A Web application that answers SPARQL-DL queries Query-answering Enhanced by SADI
  37. 37. Demos of SADI and SHARE
  38. 38. What is the phenotype of every allele of the Antirrhinum majus DEFICIENS geneSELECT ?allele ?image ?descWHERE { locus:DEF genetics:hasVariant ?allele . ?allele info:visualizedByImage ?image . ?image info:hasDescription ?desc}
  39. 39. What is the phenotype of every allele of the Antirrhinum majus DEFICIENS geneSELECT ?allele ?image ?descWHERE { locus:DEF genetics:hasVariant ?allele . ?allele info:visualizedByImage ?image . ?image info:hasDescription ?desc} Note that there is no “FROM” clause! We don’t tell it where it should get the information, The machine has to figure that out by itself...
  40. 40. Enter that query into SHARE
  41. 41. Click “Submit”...
  42. 42. SHARE examines available SADI Web Services ...and in a few seconds you get your answer.
  43. 43. The query results are live hyperlinksto the respective Database or images (the answer is IN the Web!)
  44. 44. What pathways does UniProt protein P47989 belong to?PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathwayWHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway .}
  45. 45. What pathways does UniProt protein P47989 belong to?PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathwayWHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway .}
  46. 46. What pathways does UniProt protein P47989 belong to?PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathwayWHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway .} Note again that there is no “From” clause… I have not told SHARE where to look for the answer, I am simply asking my question
  47. 47. Enter that query into SHARE
  48. 48. Two differentTwo different providers ofproviders of pathwaygene informationinformation (KEGG and(KEGG & GO);NCBI); were found &were found & accessedaccessed
  49. 49. The results are all links to the original data (The answer is IN the Web!)
  50. 50. Show me the latest Blood Urea Nitrogen and Creatinine levels of patients who appear to be rejecting their transplants (I showed you this query in ISoLA 2010… sorry for repeating myself  )PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#>PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#>SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE { ?patient rdf:type patient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat .}
  51. 51. Show me the latest Blood Urea Nitrogen and Creatinine levels of patients who appear to be rejecting their transplants (I showed you this query in 2010… sorry for repeating myself!)PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#>PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#>SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE { ?patient rdf:type patient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat .}
  52. 52. Likely Rejecter:A patient who has creatinine levels that are increasing over time - - Mark D Wilkinson’s definition
  53. 53. Likely Rejecter: …but there is no “likely rejecter” column or table in our database…only blood chemistry measurements at various time-points
  54. 54. Likely Rejecter:So the data required to answer this question DOESN’T EXIST!
  55. 55. ?
  56. 56. Enter that query into SHARE
  57. 57. SHARE “decomposes” the Likely Rejector OWL classinto its constituent property restrictions
  58. 58. Each property restriction in the Classis matched with a SADI ServiceThe matched SADI Service cangenerate data that has that property
  59. 59. SHARE chains these SADI servicesare into a workflow......the outputs from that workflow areInstances (OWL Individuals)of the Likely Rejector OWL Class
  60. 60. For example… SHARE utilizes SADI to discoveranalytical services on the Web that do linear regression analysis;required for the “increasing over time” part of the Class definition
  61. 61. VOILA!
  62. 62. SHARE examines the OWL Class Gathers, from the Web, the ontologies that are referenced by that Class then uses those ontological properties to identify which data-sources and analytical tools it mustaccess to create data matching that Class definition
  63. 63. OWL
  64. 64. The way SHARE builds the workflow varies depending on the context of the query(i.e. which data/ontologies it reads – Mine? Yours?) and on what part of the query it is trying to answer at any given moment(which ontological concept is relevant to that clause)
  65. 65. And that brings us back to...
  66. 66. Web Science 2.0
  67. 67. Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspeciesdata mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
  68. 68. derives and executes the following workflow automatically using an OWL ontology that describes the biology
  69. 69. The analytical tools chosen for that workflow were determined based on contexteven though the biological (ontological) model driving their selection was the same
  70. 70. i.e.The published model is re-usable
  71. 71. i.e. The published model is re-usableIn different contexts... by different researchers
  72. 72. Because the model IS the experiment the published EXPERIMENT is re-usable!!Simply point the same query at your own dataset...
  73. 73. Thescientific publication is anexecutable document!
  74. 74. Every component of the modelEvery component of the input dataEvery component of the output data is a URLTherefore the model, the question, the experiment, and the results are inherently IN the Web
  75. 75. Every component of the model Every component of the input data Every component of the output data is a URL The answer, and the knowledge derived from it, is immediately available to Web search enginesand moreover, can instantly affect the outcome of other Web Science experiments
  76. 76. YouAre Now Here!!!
  77. 77. Change the way we think of “hypotheses”
  78. 78. In Web Science 2.0Model what the world would “look like” if your hypothesis were true Then ask “is there any data that fits that model?”
  79. 79. Please join us!SADI and SHARE are Open-Source projects http://sadiframework.org
  80. 80. My New Home!
  81. 81. University of British ColumbiaLuke McCarthy – Lead Dev. Edward KawasEverything... SADI Service auto-generatorBenjamin VanderValk Ian WoodSHARE & SADI & Experimental modeling & Experimental modeling projectmyHeath ButtonSoroush SamadianCardiovascular data modeling and queries
  82. 82. C-BRASS Collaborators at other sitesU of New Brunswick Carleton UniversityDr. Chris Baker Dr. Michel DumontierAlexandre Riazanov Marc-Alexandre Nolin Leonid Chepelev Steve Etlinger Nichaella Kieth Jose Cruz
  83. 83. Microsoft Research

×