We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF<br />1<br />ChEBI User Group Meeti...
Syntactic Web…<br />It takes a lot of digging to get answers<br />3<br />ChEBI User Group Meeting:June 24, 2010<br />
We need to get to the deep web <br />Surface web:167 terabytes<br />Deep web:91,000 terabytes<br />545-to-one<br />ChEBI U...
ChEBI User Group Meeting:June 24, 2010<br />and  tap into the global web of structured knowledge<br />5<br />
The Semantic Web<br /> is the new global web of knowledge<br />6<br />ChEBI User Group Meeting:June 24, 2010<br />It is ab...
Goals<br />Provision chemical data on the Web<br />Find cheminformatic services that will consume the data<br />Answer que...
Is caffeine a drug-like molecule?<br />ChEBI User Group Meeting:June 24, 2010<br />8<br />
Lipinski Rule of Five<br />Rule of thumb for druglikeness (orally active in humans)<br />	(4 rules with multiples of 5)<br...
Formal Ontology as a Strategy<br />10<br />ChEBI User Group Meeting:June 24, 2010<br />
The Web Ontology Language (OWL) Has Explicit Semantics<br />Can therefore be used to capture knowledge in a machine unders...
Lipinski Rule of Five<br />Empirically derived ruleset for druglikeness<br />	(4 rules with multiples of 5)<br />Less than...
To calculate these attributes, we need access to a computable representation of the molecular structure<br />ChEBI User Gr...
The chemical graph specifies the type and connectivity of atoms in molecules. It describes a part of chemical structureSMI...
Chemical descriptors<br />Chemical descriptors are data (quantities or values) that provide information about substances, ...
The Chemical Information Ontology (CHEMINF)<br />100 chemical descriptors<br />50 chemical qualities<br />Relates descript...
CHEMINF provides the vocabulary to define an input (SMILES-annotated molecule) and an output (molecule annotated with a de...
Ultimately, the goal is to use an OWL reasoner to reason about the attributes to determine whether the compound is drug-li...
Semantic Automated Discovery and Integration<br />http://sadiframework.org<br />SADI is a framework to create Semantic Web...
SADI<br />OWL classes in SADI are local to individual services<br />They should uniquely specify the service input and out...
Create code stubs using the ontology<br />Publish the ontology to a web-accessible location<br />http://semanticscience.or...
Implement the functionality<br />Java version <br />Uses Jena to manipulate the RDF graph<br />Uses Maven to build from co...
Working with the service (GET)<br />Responds to a GET by providing the service description in RDF<br />conforms to Feta (B...
Working with the service (POST)<br />Responds to a POST with service output (process an input file)<br />ChEBI User Group ...
Publish and Register the service<br />ChEBI User Group Meeting:June 24, 2010<br />25<br />http://sadiframework.org/registr...
Now what?<br />ChEBI User Group Meeting:June 24, 2010<br />26<br />
ChEBI User Group Meeting:June 24, 2010<br />27<br />Semantic Health and Research Environment<br />SHARE is an application ...
“Reckoning”dynamic discovery of instances of OWL classes through synthesis and invocation of a Web Service workflow capabl...
29<br />SPARQL is the newcool kid on the query block<br />SQLSPARQL<br />ChEBI User Group Meeting:June 24, 2010<br />
SHARE<br />SPARQL engine<br />triple patterns are matched against service descriptions<br />knowledge base is dynamically ...
ChEBI has data!<br />ChEBI User Group Meeting:June 24, 2010<br />31<br />
Bio2RDF provides ChEBI in RDF<br />ChEBI User Group Meeting:June 24, 2010<br />32<br />
Bio2RDF now serving over 40 billion triples of linked biological data<br />33<br />ChEBI User Group Meeting:June 24, 2010<...
34<br />ChEBI User Group Meeting:June 24, 2010<br />
An increasing amount of machine understandable chemical data<br />ChEBI User Group Meeting:June 24, 2010<br />35<br />
Query for log p<br />ChEBI User Group Meeting:June 24, 2010<br />36<br />
Query: Is caffeine a drug-like molecule?<br />ChEBI User Group Meeting:June 24, 2010<br />37<br />
SADI<br /><ul><li> Describe the input and output using OWL-DL classes
 Subject of input and output must be the same
 Web services indexed by predicates
Biocatalogue will list SADI-compliant services
Upcoming SlideShare
Loading in …5
×

We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

2,648 views
2,425 views

Published on

Published in: Technology, Education
1 Comment
3 Likes
Statistics
Notes
  • nice presentation
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
2,648
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
29
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide
  • Can’t answer questions that require background knowledge
  • Research – that’s what brought you hereSkils – marketable in whatever you choose to do thereafterKnowledeable – where the field has been and where it is goingImprove oral and written scientific communication skillsResearch – tell people what you’ve been doingTrack progress – develop a sense of progress
  • We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF

    1. 1. We’re all SMILES! Building Chemical Semantic Web Services with SADI, ChEBI, and CHEMINF<br />1<br />ChEBI User Group Meeting:June 24, 2010<br />Michel Dumontier, Ph.D.<br />Associate Professor of Bioinformatics<br />Carleton University<br />Department of Biology<br />School of Computer Science<br />Institute of Biochemistry<br />Ottawa Institute of Systems Biology<br />Ottawa-Carleton Institute of Biomedical Engineering<br />
    2. 2.
    3. 3. Syntactic Web…<br />It takes a lot of digging to get answers<br />3<br />ChEBI User Group Meeting:June 24, 2010<br />
    4. 4. We need to get to the deep web <br />Surface web:167 terabytes<br />Deep web:91,000 terabytes<br />545-to-one<br />ChEBI User Group Meeting:June 24, 2010<br />4<br />
    5. 5. ChEBI User Group Meeting:June 24, 2010<br />and tap into the global web of structured knowledge<br />5<br />
    6. 6. The Semantic Web<br /> is the new global web of knowledge<br />6<br />ChEBI User Group Meeting:June 24, 2010<br />It is about standards for publishing, sharing and querying <br />knowledge drawn from diverse sources<br />It makes possible the answering<br />sophisticated questions using<br /> background knowledge<br />
    7. 7. Goals<br />Provision chemical data on the Web<br />Find cheminformatic services that will consume the data<br />Answer questions about chemicals by reasoning over essential chemical knowledge<br />ChEBI User Group Meeting:June 24, 2010<br />7<br />
    8. 8. Is caffeine a drug-like molecule?<br />ChEBI User Group Meeting:June 24, 2010<br />8<br />
    9. 9. Lipinski Rule of Five<br />Rule of thumb for druglikeness (orally active in humans)<br /> (4 rules with multiples of 5)<br />Less than 500 Dalton<br />Less than 5 hydrogen bond donors<br />Less than 10 hydrogen bond acceptors<br />A partition coefficient value between -5 and 5<br />We need a more formal (machine understandable) description<br />ChEBI User Group Meeting:June 24, 2010<br />9<br />
    10. 10. Formal Ontology as a Strategy<br />10<br />ChEBI User Group Meeting:June 24, 2010<br />
    11. 11. The Web Ontology Language (OWL) Has Explicit Semantics<br />Can therefore be used to capture knowledge in a machine understandable way<br />11<br />ChEBI User Group Meeting:June 24, 2010<br />
    12. 12. Lipinski Rule of Five<br />Empirically derived ruleset for druglikeness<br /> (4 rules with multiples of 5)<br />Less than 500 Dalton<br />Less than 5 hydrogen bond donors<br />Less than 10 hydrogen bond acceptors<br />A partition coefficient value between -5 and 5<br />A formal description using OWL:<br />ChEBI User Group Meeting:June 24, 2010<br />12<br />
    13. 13. To calculate these attributes, we need access to a computable representation of the molecular structure<br />ChEBI User Group Meeting:June 24, 2010<br />13<br />ball & stick model for caffeine<br />
    14. 14. The chemical graph specifies the type and connectivity of atoms in molecules. It describes a part of chemical structureSMILES strings are common representations of the chemical graph<br />ChEBI User Group Meeting:June 24, 2010<br />14<br />Cn1cnc2n(C)c(=O)n(C)c(=O)c12<br />ball & stick model for caffeine<br />SMILES string <br />for caffeine<br />
    15. 15. Chemical descriptors<br />Chemical descriptors are data (quantities or values) that provide information about substances, molecular entities, and their parts (rings, atoms, bonds, etc).<br />Sometimes they enumerate material parts, they quantify or describe qualities, functions or dispositions<br />Often used to build Quantitative Structure Activity Relationships (QSAR) models<br />Example descriptors :<br />Mass values<br />Partition coefficients<br />Heats of formation<br />Aromaticity values<br />Molecular formulas<br />ChEBI User Group Meeting:June 24, 2010<br />15<br />
    16. 16. The Chemical Information Ontology (CHEMINF)<br />100 chemical descriptors<br />50 chemical qualities<br />Relates descriptors to their specifications, the software that generated them (along with the running parameters, and the algorithms that they implement)<br />Contributors: Nico Adams, Leonid Chepelev, Michel Dumontier, Janna Hastings, EgonWillighagen, Peter Murray-Rust, CristophSteinbeck<br />ChEBI User Group Meeting:June 24, 2010<br />16<br />http://semanticchemistry.googlecode.com<br />
    17. 17. CHEMINF provides the vocabulary to define an input (SMILES-annotated molecule) and an output (molecule annotated with a descriptor)<br />ChEBI User Group Meeting:June 24, 2010<br />17<br />
    18. 18. Ultimately, the goal is to use an OWL reasoner to reason about the attributes to determine whether the compound is drug-like<br />ChEBI User Group Meeting:June 24, 2010<br />18<br />
    19. 19. Semantic Automated Discovery and Integration<br />http://sadiframework.org<br />SADI is a framework to create Semantic Web services using OWL classes as service inputs and outputs<br />Mark Wilkinson, UBC<br />Michel Dumontier, Carleton University<br />Christopher Baker, UNB<br />19<br />ChEBI User Group Meeting:June 24, 2010<br />
    20. 20. SADI<br />OWL classes in SADI are local to individual services<br />They should uniquely specify the service input and outputs (they exactly have the right restrictions)<br />one service’s world-view can conflict with another,but a client can use any or all<br />maximize interoperability by reusing types and relations<br />ChEBI User Group Meeting:June 24, 2010<br />20<br />
    21. 21. Create code stubs using the ontology<br />Publish the ontology to a web-accessible location<br />http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl <br />Make sure that the class names are resolvable<br />(easy when using the hash notation)<br />http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#smiles-molecule<br />http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#logp-molecule<br />http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#hbdc-molecule<br />http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#hdba-molecule<br />http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#lipinksi-druglike-molecule<br />Download/checkout the code<br /> http://sadiframework.org<br />Run the code generator <br />specify the URIs that correspond to input and output types<br />ChEBI User Group Meeting:June 24, 2010<br />21<br />
    22. 22. Implement the functionality<br />Java version <br />Uses Jena to manipulate the RDF graph<br />Uses Maven to build from command-line or Eclipse; Invokes Jetty for service testing<br />Chemistry<br />We used the Chemistry Development Kit (CDK) to implement 4 services<br />ChEBI User Group Meeting:June 24, 2010<br />22<br />
    23. 23. Working with the service (GET)<br />Responds to a GET by providing the service description in RDF<br />conforms to Feta (BioMoby, myGrid)<br />ChEBI User Group Meeting:June 24, 2010<br />23<br />curl http://cbrass.biordf.net/logpdc/logpc<br /><rdf:RDF<br />xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"<br /> xmlns:j.0="http://www.mygrid.org.uk/mygrid-moby-service#" > <br /> <rdf:Descriptionrdf:about=""><br /> <j.0:hasServiceDescriptionText>no description</j.0:hasServiceDescriptionText><br /> <j.0:hasServiceNameText rdf:datatype="http://www.w3.org/2001/XMLSchema#string">logpc</j.0:hasServiceNameText><br /> <j.0:hasOperation rdf:resource="#operation"/><br /> <rdf:typerdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#serviceDescription"/><br /> </rdf:Description><br /> <rdf:Descriptionrdf:about="#input"><br /> <j.0:objectType rdf:resource="http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#smilesmolecule"/><br /> <rdf:typerdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#parameter"/><br /> </rdf:Description><br /> <rdf:Descriptionrdf:about="#operation"><br /> <j.0:outputParameter rdf:resource="#output"/><br /> <j.0:inputParameter rdf:resource="#input"/><br /> <rdf:typerdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#operation"/><br /> </rdf:Description><br /> <rdf:Descriptionrdf:about="#output"><br /> <j.0:objectType rdf:resource="http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#alogpsmilesmolecule"/><br /> <rdf:typerdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#parameter"/><br /> </rdf:Description><br /></rdf:RDF><br />
    24. 24. Working with the service (POST)<br />Responds to a POST with service output (process an input file)<br />ChEBI User Group Meeting:June 24, 2010<br />24<br /><rdf:RDFxmlns="http://semanticscience.org/sadi/ontology/caffeine.rdf#"<br />xmlns:so="http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#"<br />xmlns:owl="http://www.w3.org/2002/07/owl#"<br />xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"<br />xmlns:sio="http://semanticscience.org/resource/"<br />xmlns:xsd="http://www.w3.org/2001/XMLSchema#"><br /> <so:smilesmoleculerdf:about="http://semanticscience.org/sadi/ontology/caffeine.rdf#m"><br /> <sio:SIO_000008 rdf:resource = "http://semanticscience.org/sadi/ontology/caffeine.rdf#msmiles"/><br /> </so:smilesmolecule><br /> <sio:CHEMINF_000018 rdf:about = "http://semanticscience.org/sadi/ontology/caffeine.rdf#msmiles"><br /> <sio:SIO_000300 rdf:datatype="xsd:string">Cn1cnc2n(C)c(=O)n(C)c(=O)c12</sio:SIO_000300><br /> </sio:CHEMINF_000018><br /></rdf:RDF><br /> curl --data @caffeine.rdf http://cbrass.biordf.net/logpdc/logpc<br /> <rdf:Descriptionrdf:about="http://semanticscience.org/sadi/ontology/caffeine.rdf#mdalogp"><br /> <rdf:typerdf:resource="http://semanticscience.org/resource/CHEMINF_000251"/><br /> <j.0:SIO_000300 rdf:datatype="http://www.w3.org/2001/XMLSchema#double">-0.4311000000000006</j.0:SIO_000300><br /> </rdf:Description><br />
    25. 25. Publish and Register the service<br />ChEBI User Group Meeting:June 24, 2010<br />25<br />http://sadiframework.org/registry<br />
    26. 26. Now what?<br />ChEBI User Group Meeting:June 24, 2010<br />26<br />
    27. 27. ChEBI User Group Meeting:June 24, 2010<br />27<br />Semantic Health and Research Environment<br />SHARE is an application that execute (SPARQL) queries as workflows over SADI Services<br />
    28. 28. “Reckoning”dynamic discovery of instances of OWL classes through synthesis and invocation of a Web Service workflow capable of generating data described by the OWL class restrictions, followed by reasoning to classify the data into that ontology<br />28<br />ChEBI User Group Meeting:June 24, 2010<br />
    29. 29. 29<br />SPARQL is the newcool kid on the query block<br />SQLSPARQL<br />ChEBI User Group Meeting:June 24, 2010<br />
    30. 30. SHARE<br />SPARQL engine<br />triple patterns are matched against service descriptions<br />knowledge base is dynamically populated<br />queries can contain OWL classes, which are expanded to the required triple patterns<br />query is optimized to minimize the number of service calls and the amount of data sent over the network<br />ChEBI User Group Meeting:June 24, 2010<br />30<br />
    31. 31. ChEBI has data!<br />ChEBI User Group Meeting:June 24, 2010<br />31<br />
    32. 32. Bio2RDF provides ChEBI in RDF<br />ChEBI User Group Meeting:June 24, 2010<br />32<br />
    33. 33. Bio2RDF now serving over 40 billion triples of linked biological data<br />33<br />ChEBI User Group Meeting:June 24, 2010<br />
    34. 34. 34<br />ChEBI User Group Meeting:June 24, 2010<br />
    35. 35. An increasing amount of machine understandable chemical data<br />ChEBI User Group Meeting:June 24, 2010<br />35<br />
    36. 36. Query for log p<br />ChEBI User Group Meeting:June 24, 2010<br />36<br />
    37. 37. Query: Is caffeine a drug-like molecule?<br />ChEBI User Group Meeting:June 24, 2010<br />37<br />
    38. 38. SADI<br /><ul><li> Describe the input and output using OWL-DL classes
    39. 39. Subject of input and output must be the same
    40. 40. Web services indexed by predicates
    41. 41. Biocatalogue will list SADI-compliant services
    42. 42. Tavernaplugin to work with SADI services
    43. 43. Protégé 4.1 plugin to create SADI services
    44. 44. Simplified migration path for existing web services (java, perl)</li></ul>38<br />ChEBI User Group Meeting:June 24, 2010<br />
    45. 45. Benefits<br />Data remains distributed – no warehouse!<br />Data is not “exposed” as a SPARQL endpoint<br />greater provider-control over computational resources<br />Yet data appears to be a SPARQL endpoint… no modification of SPARQL or reasoner required.<br />ChEBI User Group Meeting:June 24, 2010<br />39<br />
    46. 46. Join Us!<br />SADI and CardioSHARE are Open Source<br />Come join us – we’re having a lot of fun!!<br />http://sadiframework.org<br />ChEBI User Group Meeting:June 24, 2010<br />40<br />
    47. 47. ChEBI User Group Meeting:June 24, 2010<br />41<br />Acknowledgements<br />Leonid Chepelev (implementing the services)<br />Luke McCarthy (technical support)<br />Mark Wilkinson (vision and leadership)<br />CHEMINF Group<br />Janna Hastings<br />Nico Adams<br />EgonWillighagen<br />This research is supported by The Heart + Stroke Foundation of BC and Yukon, Microsoft Research, The Canadian Institutes of Health Research, The Natural Sciences and Engineering Research Council of Canada and CANARIE.<br />

    ×