Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF<br />1<br />ChEBI User Group Meeti...
Syntactic Web…<br />It takes a lot of digging to get answers<br />3<br />ChEBI User Group Meeting:June 24, 2010<br />
We need to get to the deep web <br />Surface web:167 terabytes<br />Deep web:91,000 terabytes<br />545-to-one<br />ChEBI U...
ChEBI User Group Meeting:June 24, 2010<br />and  tap into the global web of structured knowledge<br />5<br />
The Semantic Web<br /> is the new global web of knowledge<br />6<br />ChEBI User Group Meeting:June 24, 2010<br />It is ab...
Goals<br />Provision chemical data on the Web<br />Find cheminformatic services that will consume the data<br />Answer que...
Is caffeine a drug-like molecule?<br />ChEBI User Group Meeting:June 24, 2010<br />8<br />
Lipinski Rule of Five<br />Rule of thumb for druglikeness (orally active in humans)<br />	(4 rules with multiples of 5)<br...
Formal Ontology as a Strategy<br />10<br />ChEBI User Group Meeting:June 24, 2010<br />
The Web Ontology Language (OWL) Has Explicit Semantics<br />Can therefore be used to capture knowledge in a machine unders...
Lipinski Rule of Five<br />Empirically derived ruleset for druglikeness<br />	(4 rules with multiples of 5)<br />Less than...
To calculate these attributes, we need access to a computable representation of the molecular structure<br />ChEBI User Gr...
The chemical graph specifies the type and connectivity of atoms in molecules. It describes a part of chemical structureSMI...
Chemical descriptors<br />Chemical descriptors are data (quantities or values) that provide information about substances, ...
The Chemical Information Ontology (CHEMINF)<br />100 chemical descriptors<br />50 chemical qualities<br />Relates descript...
CHEMINF provides the vocabulary to define an input (SMILES-annotated molecule) and an output (molecule annotated with a de...
Ultimately, the goal is to use an OWL reasoner to reason about the attributes to determine whether the compound is drug-li...
Semantic Automated Discovery and Integration<br />http://sadiframework.org<br />SADI is a framework to create Semantic Web...
SADI<br />OWL classes in SADI are local to individual services<br />They should uniquely specify the service input and out...
Create code stubs using the ontology<br />Publish the ontology to a web-accessible location<br />http://semanticscience.or...
Implement the functionality<br />Java version <br />Uses Jena to manipulate the RDF graph<br />Uses Maven to build from co...
Working with the service (GET)<br />Responds to a GET by providing the service description in RDF<br />conforms to Feta (B...
Working with the service (POST)<br />Responds to a POST with service output (process an input file)<br />ChEBI User Group ...
Publish and Register the service<br />ChEBI User Group Meeting:June 24, 2010<br />25<br />http://sadiframework.org/registr...
Now what?<br />ChEBI User Group Meeting:June 24, 2010<br />26<br />
ChEBI User Group Meeting:June 24, 2010<br />27<br />Semantic Health and Research Environment<br />SHARE is an application ...
“Reckoning”dynamic discovery of instances of OWL classes through synthesis and invocation of a Web Service workflow capabl...
29<br />SPARQL is the newcool kid on the query block<br />SQLSPARQL<br />ChEBI User Group Meeting:June 24, 2010<br />
SHARE<br />SPARQL engine<br />triple patterns are matched against service descriptions<br />knowledge base is dynamically ...
ChEBI has data!<br />ChEBI User Group Meeting:June 24, 2010<br />31<br />
Bio2RDF provides ChEBI in RDF<br />ChEBI User Group Meeting:June 24, 2010<br />32<br />
Bio2RDF now serving over 40 billion triples of linked biological data<br />33<br />ChEBI User Group Meeting:June 24, 2010<...
34<br />ChEBI User Group Meeting:June 24, 2010<br />
An increasing amount of machine understandable chemical data<br />ChEBI User Group Meeting:June 24, 2010<br />35<br />
Query for log p<br />ChEBI User Group Meeting:June 24, 2010<br />36<br />
Query: Is caffeine a drug-like molecule?<br />ChEBI User Group Meeting:June 24, 2010<br />37<br />
SADI<br /><ul><li> Describe the input and output using OWL-DL classes
 Subject of input and output must be the same
 Web services indexed by predicates
Biocatalogue will list SADI-compliant services
Upcoming SlideShare
Loading in …5
×

We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

2,931 views

Published on

Published in: Technology, Education

We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

  1. 1. We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF<br />1<br />ChEBI User Group Meeting:June 24, 2010<br />Michel Dumontier, Ph.D.<br />Associate Professor of Bioinformatics<br />Carleton University<br />Department of Biology<br />School of Computer Science<br />Institute of Biochemistry<br />Ottawa Institute of Systems Biology<br />Ottawa-Carleton Institute of Biomedical Engineering<br />
  2. 2.
  3. 3. Syntactic Web…<br />It takes a lot of digging to get answers<br />3<br />ChEBI User Group Meeting:June 24, 2010<br />
  4. 4. We need to get to the deep web <br />Surface web:167 terabytes<br />Deep web:91,000 terabytes<br />545-to-one<br />ChEBI User Group Meeting:June 24, 2010<br />4<br />
  5. 5. ChEBI User Group Meeting:June 24, 2010<br />and tap into the global web of structured knowledge<br />5<br />
  6. 6. The Semantic Web<br /> is the new global web of knowledge<br />6<br />ChEBI User Group Meeting:June 24, 2010<br />It is about standards for publishing, sharing and querying <br />knowledge drawn from diverse sources<br />It makes possible the answering<br />sophisticated questions using<br /> background knowledge<br />
  7. 7. Goals<br />Provision chemical data on the Web<br />Find cheminformatic services that will consume the data<br />Answer questions about chemicals by reasoning over essential chemical knowledge<br />ChEBI User Group Meeting:June 24, 2010<br />7<br />
  8. 8. Is caffeine a drug-like molecule?<br />ChEBI User Group Meeting:June 24, 2010<br />8<br />
  9. 9. Lipinski Rule of Five<br />Rule of thumb for druglikeness (orally active in humans)<br /> (4 rules with multiples of 5)<br />Less than 500 Dalton<br />Less than 5 hydrogen bond donors<br />Less than 10 hydrogen bond acceptors<br />A partition coefficient value between -5 and 5<br />We need a more formal (machine understandable) description<br />ChEBI User Group Meeting:June 24, 2010<br />9<br />
  10. 10. Formal Ontology as a Strategy<br />10<br />ChEBI User Group Meeting:June 24, 2010<br />
  11. 11. The Web Ontology Language (OWL) Has Explicit Semantics<br />Can therefore be used to capture knowledge in a machine understandable way<br />11<br />ChEBI User Group Meeting:June 24, 2010<br />
  12. 12. Lipinski Rule of Five<br />Empirically derived ruleset for druglikeness<br /> (4 rules with multiples of 5)<br />Less than 500 Dalton<br />Less than 5 hydrogen bond donors<br />Less than 10 hydrogen bond acceptors<br />A partition coefficient value between -5 and 5<br />A formal description using OWL:<br />ChEBI User Group Meeting:June 24, 2010<br />12<br />
  13. 13. To calculate these attributes, we need access to a computable representation of the molecular structure<br />ChEBI User Group Meeting:June 24, 2010<br />13<br />ball & stick model for caffeine<br />
  14. 14. The chemical graph specifies the type and connectivity of atoms in molecules. It describes a part of chemical structureSMILES strings are common representations of the chemical graph<br />ChEBI User Group Meeting:June 24, 2010<br />14<br />Cn1cnc2n(C)c(=O)n(C)c(=O)c12<br />ball & stick model for caffeine<br />SMILES string <br />for caffeine<br />
  15. 15. Chemical descriptors<br />Chemical descriptors are data (quantities or values) that provide information about substances, molecular entities, and their parts (rings, atoms, bonds, etc).<br />Sometimes they enumerate material parts, they quantify or describe qualities, functions or dispositions<br />Often used to build Quantitative Structure Activity Relationships (QSAR) models<br />Example descriptors :<br />Mass values<br />Partition coefficients<br />Heats of formation<br />Aromaticity values<br />Molecular formulas<br />ChEBI User Group Meeting:June 24, 2010<br />15<br />
  16. 16. The Chemical Information Ontology (CHEMINF)<br />100 chemical descriptors<br />50 chemical qualities<br />Relates descriptors to their specifications, the software that generated them (along with the running parameters, and the algorithms that they implement)<br />Contributors: Nico Adams, Leonid Chepelev, Michel Dumontier, Janna Hastings, EgonWillighagen, Peter Murray-Rust, CristophSteinbeck<br />ChEBI User Group Meeting:June 24, 2010<br />16<br />http://semanticchemistry.googlecode.com<br />
  17. 17. CHEMINF provides the vocabulary to define an input (SMILES-annotated molecule) and an output (molecule annotated with a descriptor)<br />ChEBI User Group Meeting:June 24, 2010<br />17<br />
  18. 18. Ultimately, the goal is to use an OWL reasoner to reason about the attributes to determine whether the compound is drug-like<br />ChEBI User Group Meeting:June 24, 2010<br />18<br />
  19. 19. Semantic Automated Discovery and Integration<br />http://sadiframework.org<br />SADI is a framework to create Semantic Web services using OWL classes as service inputs and outputs<br />Mark Wilkinson, UBC<br />Michel Dumontier, Carleton University<br />Christopher Baker, UNB<br />19<br />ChEBI User Group Meeting:June 24, 2010<br />
  20. 20. SADI<br />OWL classes in SADI are local to individual services<br />They should uniquely specify the service input and outputs (they exactly have the right restrictions)<br />one service’s world-view can conflict with another,but a client can use any or all<br />maximize interoperability by reusing types and relations<br />ChEBI User Group Meeting:June 24, 2010<br />20<br />
  21. 21. Create code stubs using the ontology<br />Publish the ontology to a web-accessible location<br />http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl <br />Make sure that the class names are resolvable<br />(easy when using the hash notation)<br />http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#smiles-molecule<br />http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#logp-molecule<br />http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#hbdc-molecule<br />http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#hdba-molecule<br />http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#lipinksi-druglike-molecule<br />Download/checkout the code<br /> http://sadiframework.org<br />Run the code generator <br />specify the URIs that correspond to input and output types<br />ChEBI User Group Meeting:June 24, 2010<br />21<br />
  22. 22. Implement the functionality<br />Java version <br />Uses Jena to manipulate the RDF graph<br />Uses Maven to build from command-line or Eclipse; Invokes Jetty for service testing<br />Chemistry<br />We used the Chemistry Development Kit (CDK) to implement 4 services<br />ChEBI User Group Meeting:June 24, 2010<br />22<br />
  23. 23. Working with the service (GET)<br />Responds to a GET by providing the service description in RDF<br />conforms to Feta (BioMoby, myGrid)<br />ChEBI User Group Meeting:June 24, 2010<br />23<br />curl http://cbrass.biordf.net/logpdc/logpc<br /><rdf:RDF<br />xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"<br /> xmlns:j.0="http://www.mygrid.org.uk/mygrid-moby-service#" > <br /> <rdf:Descriptionrdf:about=""><br /> <j.0:hasServiceDescriptionText>no description</j.0:hasServiceDescriptionText><br /> <j.0:hasServiceNameText rdf:datatype="http://www.w3.org/2001/XMLSchema#string">logpc</j.0:hasServiceNameText><br /> <j.0:hasOperation rdf:resource="#operation"/><br /> <rdf:typerdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#serviceDescription"/><br /> </rdf:Description><br /> <rdf:Descriptionrdf:about="#input"><br /> <j.0:objectType rdf:resource="http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#smilesmolecule"/><br /> <rdf:typerdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#parameter"/><br /> </rdf:Description><br /> <rdf:Descriptionrdf:about="#operation"><br /> <j.0:outputParameter rdf:resource="#output"/><br /> <j.0:inputParameter rdf:resource="#input"/><br /> <rdf:typerdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#operation"/><br /> </rdf:Description><br /> <rdf:Descriptionrdf:about="#output"><br /> <j.0:objectType rdf:resource="http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#alogpsmilesmolecule"/><br /> <rdf:typerdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#parameter"/><br /> </rdf:Description><br /></rdf:RDF><br />
  24. 24. Working with the service (POST)<br />Responds to a POST with service output (process an input file)<br />ChEBI User Group Meeting:June 24, 2010<br />24<br /><rdf:RDFxmlns="http://semanticscience.org/sadi/ontology/caffeine.rdf#"<br />xmlns:so="http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#"<br />xmlns:owl="http://www.w3.org/2002/07/owl#"<br />xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"<br />xmlns:sio="http://semanticscience.org/resource/"<br />xmlns:xsd="http://www.w3.org/2001/XMLSchema#"><br /> <so:smilesmoleculerdf:about="http://semanticscience.org/sadi/ontology/caffeine.rdf#m"><br /> <sio:SIO_000008 rdf:resource = "http://semanticscience.org/sadi/ontology/caffeine.rdf#msmiles"/><br /> </so:smilesmolecule><br /> <sio:CHEMINF_000018 rdf:about = "http://semanticscience.org/sadi/ontology/caffeine.rdf#msmiles"><br /> <sio:SIO_000300 rdf:datatype="xsd:string">Cn1cnc2n(C)c(=O)n(C)c(=O)c12</sio:SIO_000300><br /> </sio:CHEMINF_000018><br /></rdf:RDF><br /> curl --data @caffeine.rdf http://cbrass.biordf.net/logpdc/logpc<br /> <rdf:Descriptionrdf:about="http://semanticscience.org/sadi/ontology/caffeine.rdf#mdalogp"><br /> <rdf:typerdf:resource="http://semanticscience.org/resource/CHEMINF_000251"/><br /> <j.0:SIO_000300 rdf:datatype="http://www.w3.org/2001/XMLSchema#double">-0.4311000000000006</j.0:SIO_000300><br /> </rdf:Description><br />
  25. 25. Publish and Register the service<br />ChEBI User Group Meeting:June 24, 2010<br />25<br />http://sadiframework.org/registry<br />
  26. 26. Now what?<br />ChEBI User Group Meeting:June 24, 2010<br />26<br />
  27. 27. ChEBI User Group Meeting:June 24, 2010<br />27<br />Semantic Health and Research Environment<br />SHARE is an application that execute (SPARQL) queries as workflows over SADI Services<br />
  28. 28. “Reckoning”dynamic discovery of instances of OWL classes through synthesis and invocation of a Web Service workflow capable of generating data described by the OWL class restrictions, followed by reasoning to classify the data into that ontology<br />28<br />ChEBI User Group Meeting:June 24, 2010<br />
  29. 29. 29<br />SPARQL is the newcool kid on the query block<br />SQLSPARQL<br />ChEBI User Group Meeting:June 24, 2010<br />
  30. 30. SHARE<br />SPARQL engine<br />triple patterns are matched against service descriptions<br />knowledge base is dynamically populated<br />queries can contain OWL classes, which are expanded to the required triple patterns<br />query is optimized to minimize the number of service calls and the amount of data sent over the network<br />ChEBI User Group Meeting:June 24, 2010<br />30<br />
  31. 31. ChEBI has data!<br />ChEBI User Group Meeting:June 24, 2010<br />31<br />
  32. 32. Bio2RDF provides ChEBI in RDF<br />ChEBI User Group Meeting:June 24, 2010<br />32<br />
  33. 33. Bio2RDF now serving over 40 billion triples of linked biological data<br />33<br />ChEBI User Group Meeting:June 24, 2010<br />
  34. 34. 34<br />ChEBI User Group Meeting:June 24, 2010<br />
  35. 35. An increasing amount of machine understandable chemical data<br />ChEBI User Group Meeting:June 24, 2010<br />35<br />
  36. 36. Query for log p<br />ChEBI User Group Meeting:June 24, 2010<br />36<br />
  37. 37. Query: Is caffeine a drug-like molecule?<br />ChEBI User Group Meeting:June 24, 2010<br />37<br />
  38. 38. SADI<br /><ul><li> Describe the input and output using OWL-DL classes
  39. 39. Subject of input and output must be the same
  40. 40. Web services indexed by predicates
  41. 41. Biocatalogue will list SADI-compliant services
  42. 42. Tavernaplugin to work with SADI services
  43. 43. Protégé 4.1 plugin to create SADI services
  44. 44. Simplified migration path for existing web services (java, perl)</li></ul>38<br />ChEBI User Group Meeting:June 24, 2010<br />
  45. 45. Benefits<br />Data remains distributed – no warehouse!<br />Data is not “exposed” as a SPARQL endpoint<br />greater provider-control over computational resources<br />Yet data appears to be a SPARQL endpoint… no modification of SPARQL or reasoner required.<br />ChEBI User Group Meeting:June 24, 2010<br />39<br />
  46. 46. Join Us!<br />SADI and CardioSHARE are Open Source<br />Come join us – we’re having a lot of fun!!<br />http://sadiframework.org<br />ChEBI User Group Meeting:June 24, 2010<br />40<br />
  47. 47. ChEBI User Group Meeting:June 24, 2010<br />41<br />Acknowledgements<br />Leonid Chepelev (implementing the services)<br />Luke McCarthy (technical support)<br />Mark Wilkinson (vision and leadership)<br />CHEMINF Group<br />Janna Hastings<br />Nico Adams<br />EgonWillighagen<br />This research is supported by The Heart + Stroke Foundation of BC and Yukon, Microsoft Research, The Canadian Institutes of Health Research, The Natural Sciences and Engineering Research Council of Canada and CANARIE.<br />

×