Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Classical Series
Next
Download to read offline and view in fullscreen.

0

Share

Download to read offline

2011 ebi industry workshop

Download to read offline

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

2011 ebi industry workshop

  1. 1. Predicting Druglikeness and Toxicity from Integrated Data and Services on the Life Science Semantic Web<br />1<br />Michel Dumontier, Ph.D.<br />Associate Professor of Bioinformatics, Department of Biology, School of Computer Science, Institute of Biochemistry, Carleton University<br />Professeur Associé, Département d’informatique et de génielogiciel, Université Laval<br />Ottawa Institute of Systems Biology<br />Ottawa-Carleton Institute of Biomedical Engineering<br />2011-EBI-Industry-SW::Dumontier<br />
  2. 2. Is caffeine a drug-like molecule?<br />Is acetaminophen toxic?<br />2<br />2011-EBI-Industry-SW::Dumontier<br />
  3. 3. Finding the right information to answer a question is hard<br />and sometimes requires a sophisticated workflow<br />3<br />2011-EBI-Industry-SW::Dumontier<br />
  4. 4. 4<br />2011-EBI-Industry-SW::Dumontier<br />
  5. 5. What if we could answer a question <br />by automatically building a knowledge base using both data and services?<br />5<br />2011-EBI-Industry-SW::Dumontier<br />
  6. 6. The Semantic Web is a web of knowledge.<br />6<br />It is about standards for publishing, sharing and querying <br />knowledge drawn from diverse sources<br />It enables the answering of <br />sophisticated questions<br />2011-EBI-Industry-SW::Dumontier<br />
  7. 7. Is caffeine a drug-like molecule?<br />To answer this question we need to know:<br /><ul><li> what ‘drug like molecule’ really means
  8. 8. caffeine’s molecular structure
  9. 9. the ability to compute the relevant attributes
  10. 10. determine whether caffeine satisfies the requirements of being ‘drug like’ </li></ul>7<br />2011-EBI-Industry-SW::Dumontier<br />
  11. 11. Lipinski Rule of Five<br />Rule of thumb for druglikeness (orally active in humans)<br /> (4 rules with multiples of 5)<br />mass of less than 500 Daltons<br />fewer than 5 hydrogen bond donors<br />fewer than 10 hydrogen bond acceptors<br />A partition coefficient value between -5 and 5<br />We need a more formal (machine understandable) description of a ‘drug-like molecule’ which specifies values for chemical descriptors<br />8<br />2011-EBI-Industry-SW::Dumontier<br />
  12. 12. ontology as a strategy to formally represent knowledge<br />9<br />2011-EBI-Industry-SW::Dumontier<br />
  13. 13. The Web Ontology Language (OWL) Has Explicit Semantics<br />Can therefore be used to capture knowledge in a machine understandable way<br />10<br />2011-EBI-Industry-SW::Dumontier<br />
  14. 14. Semanticscience Integrated Ontology (SIO)<br />OWL2 ontology<br />900+ classes covering basic types (physical, processual, abstract, informational) with an emphasis on biological entities<br />169 basic relations (mereological, participatory, attribute/quality, spatial, temporal and representational)<br />axioms can be used by reasoners to generate inferences for consistency checking, classification and answering questions about life science knowledge<br />embodies emerging ontology design patterns <br />specifies the representation of knowledge<br />dereferenceable URIs<br />searchable in the NCBO bioportal<br />Available at http://semanticscience.org/ontology/sio.owl<br />11<br />2011-EBI-Industry-SW::Dumontier<br />
  15. 15. 12<br />2011-EBI-Industry-SW::Dumontier<br />
  16. 16. The Chemical Information Ontology (CHEMINF)<br />100+ chemical descriptors<br />50+ chemical qualities<br />Relates descriptors to their specifications, the software that generated them (along with the running parameters, and the algorithms that they implement)<br />Contributors: Nico Adams, Leonid Chepelev, Michel Dumontier, Janna Hastings, EgonWillighagen, Peter Murray-Rust, Cristoph Steinbeck<br />13<br />http://semanticchemistry.googlecode.com<br />2011-EBI-Industry-SW::Dumontier<br />
  17. 17. Molecular structure can be represented using a SMILES string, which is a common representation of the chemical graph<br />14<br />Cn1cnc2n(C)c(=O)n(C)c(=O)c12<br />ball & stick model for caffeine<br />SMILES string <br />for caffeine<br />2011-EBI-Industry-SW::Dumontier<br />
  18. 18. Lipinski Rule of Five<br />Empirically derived ruleset for druglikeness<br /> (4 rules with multiples of 5)<br />mass of less than 500 Daltons<br />fewer than 5 hydrogen bond donors<br />fewer than 10 hydrogen bond acceptors<br />A partition coefficient value between -5 and 5<br />A formal description using OWL:<br />15<br />2011-EBI-Industry-SW::Dumontier<br />
  19. 19. What we then need are services that will consume SMILES strings and annotate the molecule with the required chemical descriptors <br />16<br />then we can reason about whether it satisfies the drug-likeness definition<br />2011-EBI-Industry-SW::Dumontier<br />
  20. 20. Semantic Automated Discovery and Integration<br />http://sadiframework.org<br />SADI is a framework to create Semantic Web services using OWL classes as service inputs and outputs<br />Mark Wilkinson, UBC<br />Michel Dumontier, Carleton University<br />Christopher Baker, UNB<br />17<br />2011-EBI-Industry-SW::Dumontier<br />
  21. 21. Create code stubs using the ontology<br />Publish the ontology to a web-accessible location<br />http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl <br />Make sure that the class names are resolvable<br />(easy when using the hash notation)<br />http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#smiles-molecule<br />http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#logp-molecule<br />http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#hbdc-molecule<br />http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#hdba-molecule<br />http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#lipinksi-druglike-molecule<br />Download/checkout the code<br />http://sadiframework.org<br />Run the code generator (Java, Perl, python)<br />specify the URIs that correspond to input and output types<br />Implement the functionality<br />We used the Chemistry Development Kit (CDK) to implement 4 services<br />18<br />2011-EBI-Industry-SW::Dumontier<br />
  22. 22. Responds to a GET operation by providing the service description in RDF<br /> conforms to Feta (BioMoby, myGrid)<br />19<br />curl http://cbrass.biordf.net/logpdc/logpc<br /><rdf:RDF<br />xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"<br /> xmlns:j.0="http://www.mygrid.org.uk/mygrid-moby-service#" > <br /> <rdf:Descriptionrdf:about=""><br /> <j.0:hasServiceDescriptionText>no description</j.0:hasServiceDescriptionText><br /> <j.0:hasServiceNameText rdf:datatype="http://www.w3.org/2001/XMLSchema#string">logpc</j.0:hasServiceNameText><br /> <j.0:hasOperation rdf:resource="#operation"/><br /> <rdf:typerdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#serviceDescription"/><br /> </rdf:Description><br /> <rdf:Descriptionrdf:about="#input"><br /> <j.0:objectType rdf:resource="http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#smilesmolecule"/><br /> <rdf:typerdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#parameter"/><br /> </rdf:Description><br /> <rdf:Descriptionrdf:about="#operation"><br /> <j.0:outputParameter rdf:resource="#output"/><br /> <j.0:inputParameter rdf:resource="#input"/><br /> <rdf:typerdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#operation"/><br /> </rdf:Description><br /> <rdf:Descriptionrdf:about="#output"><br /> <j.0:objectType rdf:resource="http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#alogpsmilesmolecule"/><br /> <rdf:typerdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#parameter"/><br /> </rdf:Description><br /></rdf:RDF><br />2011-EBI-Industry-SW::Dumontier<br />
  23. 23. Responds to a POST containing service input with a service output in RDF<br />20<br />The query is in RDF:<br /><rdf:RDFxmlns="http://semanticscience.org/sadi/ontology/caffeine.rdf#"<br />xmlns:so="http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#"<br />xmlns:owl="http://www.w3.org/2002/07/owl#"<br />xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"<br />xmlns:sio="http://semanticscience.org/resource/"<br />xmlns:xsd="http://www.w3.org/2001/XMLSchema#"><br /> <so:smilesmoleculerdf:about="http://semanticscience.org/sadi/ontology/caffeine.rdf#m"><br /> <sio:SIO_000008 rdf:resource = "http://semanticscience.org/sadi/ontology/caffeine.rdf#msmiles"/><br /> </so:smilesmolecule><br /> <sio:CHEMINF_000018 rdf:about = "http://semanticscience.org/sadi/ontology/caffeine.rdf#msmiles"><br /> <sio:SIO_000300 rdf:datatype="xsd:string">Cn1cnc2n(C)c(=O)n(C)c(=O)c12</sio:SIO_000300><br /> </sio:CHEMINF_000018><br /></rdf:RDF><br />The response is in RDF:<br /> <rdf:Descriptionrdf:about="http://semanticscience.org/sadi/ontology/caffeine.rdf#mdalogp"><br /> <rdf:typerdf:resource="http://semanticscience.org/resource/CHEMINF_000251"/><br /> <j.0:SIO_000300 rdf:datatype="http://www.w3.org/2001/XMLSchema#double">-0.4311000000000006</j.0:SIO_000300><br /> </rdf:Description><br />2011-EBI-Industry-SW::Dumontier<br />
  24. 24. 61 Chemical Semantic Web Services<br />these and an increasing number of semantic web services are registered at http://sadiframework.org/registry/services/<br />21<br />2011-EBI-Industry-SW::Dumontier<br />
  25. 25. Now what?<br />22<br />2011-EBI-Industry-SW::Dumontier<br />
  26. 26. 23<br />Semantic Health and Research Environment<br />SHARE is an application that execute (SPARQL) queries as workflows over SADI Services<br />2011-EBI-Industry-SW::Dumontier<br />
  27. 27. “Reckoning”dynamic discovery of instances of OWL classes through synthesis and invocation of a Web Service workflow capable of generating data described by the OWL class restrictions, followed by reasoning to classify the data into that ontology<br />24<br />2011-EBI-Industry-SW::Dumontier<br />
  28. 28. ChEBI publishes (non-SW) data!<br />25<br />2011-EBI-Industry-SW::Dumontier<br />
  29. 29. Bio2RDF provides ChEBI in RDF <br />26<br />2011-EBI-Industry-SW::Dumontier<br />
  30. 30. Bio2RDF covers the major biological databases<br />27<br />2011-EBI-Industry-SW::Dumontier<br />
  31. 31. Bio2RDF’s RDFized data fits together<br />28<br />
  32. 32. Resource Description Framework (RDF)<br />Allows one to talk about anything<br />Uniform Resource Identifier (URI) can be used as entity names<br /> Bio2RDF specifies the naming convention<br />http://bio2rdf.org/uniprot:P05067<br /> is a name for Amyloid precursor protein<br />http://bio2rdf.org/omim:104300<br /> is a name for Alzheimer disease<br />uniprot:P05067<br />omim:104300<br />29<br />
  33. 33. Life Science Dataset Registry Coordinates Naming<br />Provides stable URI patterns for records and the entities they describe.<br />Directory Service<br />~1500 datasets & dozens of resolvers.<br />Discovery Service<br />Registry links entities to records and their representations (RDF/XML, HTML, etc) and provider (Bio2RDF, Uniprot)<br />Redirection Service<br />Automatic redirection to data provider document<br />30<br />Stanford : 22-04-2010<br />
  34. 34. Bio2RDF is now serving over 40 billion triples of linked biological data<br />31<br />2011-EBI-Industry-SW::Dumontier<br />
  35. 35. Bio2RDF is a framework to create and provision linked data networks<br />32<br />Francois Belleau, Laval University<br />Marc-Alexandre Nolin, Laval University<br />Peter Ansell, Queensland University of Technology<br />Michel Dumontier, Carleton University<br />
  36. 36. Bio2RDF is part of a growing web of linked data<br />33<br />“Linking Open Data clouddiagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/” <br />2011-EBI-Industry-SW::Dumontier<br />
  37. 37. something you can lookup or search for with rich descriptions<br />34<br />2011-EBI-Industry-SW::Dumontier<br />
  38. 38. 35<br />SPARQL is the newcool kid on the query block<br />SQLSPARQL<br />2011-EBI-Industry-SW::Dumontier<br />
  39. 39. Query for log p<br />36<br />2011-EBI-Industry-SW::Dumontier<br />
  40. 40. 37<br />2011-EBI-Industry-SW::Dumontier<br />
  41. 41. Query: Is caffeine a drug-like molecule?<br />38<br />2011-EBI-Industry-SW::Dumontier<br />
  42. 42. 39<br />2011-EBI-Industry-SW::Dumontier<br />
  43. 43. Benefits<br />Data remains distributed – as the internet was meant to be!<br />Data is not “exposed” as a SPARQL endpoint<br />greater provider-control over computational resources<br />Service invocation is straightforward and matchmaking by reasoning about ontology-based input/output descriptions<br />40<br />2011-EBI-Industry-SW::Dumontier<br />
  44. 44. Is acetaminophen toxic?<br />Classical approaches involve decision trees or machine learning over validated data.<br />Algorithms are often proprietary, even by the regulatory agencies<br />Issues around which data was used, and what the informative parameters are, and how easily can new information affect the outcomes?<br />41<br />2011-EBI-Industry-SW::Dumontier<br />
  45. 45. OWLED2011 : Large-Scale Boolean Feature Based Trees as OWL ontologies<br />42<br />2011-EBI-Industry-SW::Dumontier<br />
  46. 46. DL Reasoners give Explanations<br />43<br />2011-EBI-Industry-SW::Dumontier<br />
  47. 47. Summary<br />Semantic Web technologies offer tantalizing ability to create and share data and services for drug discovery<br />Bio2RDF provides linked life science data<br />SADI provides a framework to provide semantic web services<br />SHARE allows us to simultaneously query and reason about data and services represented using RDF/OWL<br />Expressive ontologies can be used to make toxicity decisions transparent<br />44<br />2011-EBI-Industry-SW::Dumontier<br />
  48. 48. 45<br />Acknowledgements<br />CHEMINF Group<br />Leo Chepelev<br />Janna Hastings<br />EgonWillighagen<br />Nico Adams<br />Bio2RDF: Peter Ansell, Francois Belleau, Allison Callahan, Jacques Corbeil, Jose Cruz-Toledo, Alex De Leon, Steve Etlinger, James Hogan, Nichealla Keath, Jean Morissette, Marc-Alexandre Nolin, Nicole Tourigny, Philippe Rigault and,  Paul Roe <br />SADI: Christopher Baker, Melanie Courtot, Jose Cruz-Toledo, Steve Etlinger, Nichealla Keath, Artjom Klein, Luke McCarthy, Silvane Paixao, Ben Vandervalk, Natalia Villanueva-Rosales, Mark Wilkinson<br />Toxicity Group<br />Leo Chepelev<br />Dana Klassen<br />2011-EBI-Industry-SW::Dumontier<br />
  49. 49. dumontierlab.com<br />michel_dumontier@carleton.ca<br />46<br />2011-EBI-Industry-SW::Dumontier<br />Website: http://dumontierlab.com<br />Presentations: http://slideshare.com/micheldumontier<br />

Views

Total views

1,460

On Slideshare

0

From embeds

0

Number of embeds

3

Actions

Downloads

20

Shares

0

Comments

0

Likes

0

×