Towards a integrated network of data
and services for the life sciences
1
Michel Dumontier, Ph.D.
Associate Professor of B...
Finding the right information to answer a question is hard
and sometimes requires a sophisticated workflow
2
What if we could answer a question
by automatically building a knowledge base
using both data and services?4
The Semantic Web is a web of knowledge.
5
It is about standards for publishing, sharing and querying
knowledge drawn from ...
Is caffeine a drug-like molecule?
To answer this question we need to know:
• what ‘drug like molecule’ really
means
• caffeine’s molecular structure
• use t...
Lipinski Rule of Five
• Rule of thumb for druglikeness (orally active in humans)
(4 rules with multiples of 5)
– mass of l...
ontology as a
strategy to
formally represent
knowledge
9
The Web Ontology Language (OWL) Has
Explicit Semantics
Can therefore be used to capture knowledge in a
machine understanda...
The Chemical Information Ontology
(CHEMINF)
• 100+ chemical descriptors
• 50+ chemical qualities
• Relates descriptors to ...
Molecular structure can be represented using a
SMILES string, which is a common representation
of the chemical graph
12
ba...
Lipinski Rule of Five
• Empirically derived ruleset for druglikeness
(4 rules with multiples of 5)
– mass of less than 500...
What we then need are services that will consume SMILES
strings and annotate the molecule with the required chemical
descr...
Semantic Automated
Discovery and Integration
http://sadiframework.org
Mark Wilkinson, UBC
Michel Dumontier, Carleton Unive...
SADI
• OWL classes in SADI are local to individual
services
– They should uniquely specify the service input and
outputs (...
Semanticscience Integrated Ontology
(SIO)
• OWL2 ontology
• 800 classes covering basic types (physical, processual,
inform...
Create code stubs using the ontology
• Publish the ontology to a web-accessible location
http://semanticscience.org/sadi/o...
Implement the functionality
• Java version
– Uses Jena to manipulate the RDF graph
– Uses Maven to build from command-line...
Responds to a GET operation by providing
the service description in RDF
conforms to Feta (BioMoby, myGrid)
20
curl http://...
Responds to a POST containing service
input with a service output in RDF
21
<rdf:Description rdf:about="http://semanticsci...
Now what?
22
23
Semantic Health and Research Environment
SHARE is an application that execute (SPARQL) queries as workflows
over SADI S...
“Reckoning”
dynamic discovery of instances of OWL classes
through synthesis and invocation of a Web Service
workflow capab...
ChEBI has (non-SW) data!
25
Bio2RDF provides ChEBI in RDF 
26
Bio2RDF is now serving over
40 billion triples of linked biological data
27
Bio2RDF covers the major biological
databases
28
Bio2RDF is part of a growing web of linked data
29 “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch...
something you can lookup or
search for with rich descriptions
30
31
SPARQL is the new cool kid on the query block
SQL SPARQL
Query for log p
32
33
Query:
Is caffeine a drug-like molecule?
34
Benefits
• Data remains distributed – as the internet was
meant to be!
• Data is not “exposed” as a SPARQL endpoint
– grea...
Summary
• Semantic Web technologies offer tantalizing
new opportunities to publish, share and query
data and services
• Bi...
37
Acknowledgements
This research is supported by The Heart + Stroke Foundation of BC and Yukon, Microsoft Research,
The C...
dumontierlab.com
michel_dumontier@carleton.ca
38
2010 CASCON - Towards a integrated network of data and services for the life sciences
Upcoming SlideShare
Loading in …5
×

2010 CASCON - Towards a integrated network of data and services for the life sciences

1,104 views

Published on

Towards a integrated network of data and services for the life sciences Modern biological knowledge discovery requires access to machine-understandable data that can be searched, retrieved, and subsequently analyzed using a wide array of analytical software and services. The Semantic Automated Discovery and Integration (SADI) framework is a set of conventions to formalize web service inputs and outputs using OWL ontologies that enable the automatic discovery and invocation of Semantic Web services. In this talk, I will walk through a worked example in the design and deployment of chemical semantic web services using the Chemical Development Toolkit, chemical descriptors from the Chemical Information Ontology (CHEMINF), and the Semanticscience Integrated Ontology (SIO) as a unifying, upper level ontology of basic types and relations. I will discuss how one can make use of the SADI-enabled SHARE client to reason about data obtained from Bio2RDF, the largest linked open data project, and automatically invoke chemical semantic web services to determine a chemical's drug-likeness. If you want to see the potential of the Semantic Web being realized, this talk is for you.

Published in: Education
  • Be the first to comment

  • Be the first to like this

2010 CASCON - Towards a integrated network of data and services for the life sciences

  1. 1. Towards a integrated network of data and services for the life sciences 1 Michel Dumontier, Ph.D. Associate Professor of Bioinformatics Carleton University Department of Biology School of Computer Science Institute of Biochemistry Ottawa Institute of Systems Biology Ottawa-Carleton Institute of Biomedical Engineering
  2. 2. Finding the right information to answer a question is hard and sometimes requires a sophisticated workflow 2
  3. 3. What if we could answer a question by automatically building a knowledge base using both data and services?4
  4. 4. The Semantic Web is a web of knowledge. 5 It is about standards for publishing, sharing and querying knowledge drawn from diverse sources It enables the answering of sophisticated questions
  5. 5. Is caffeine a drug-like molecule?
  6. 6. To answer this question we need to know: • what ‘drug like molecule’ really means • caffeine’s molecular structure • use the structural information to compute the attributes • determine whether caffeine satisfies the requirements of being ‘drug like’ Is caffeine a drug-like molecule?
  7. 7. Lipinski Rule of Five • Rule of thumb for druglikeness (orally active in humans) (4 rules with multiples of 5) – mass of less than 500 Daltons – fewer than 5 hydrogen bond donors – fewer than 10 hydrogen bond acceptors – A partition coefficient value between -5 and 5 We need a more formal (machine understandable) description of a ‘drug-like molecule’ which specifies values for chemical descriptors
  8. 8. ontology as a strategy to formally represent knowledge 9
  9. 9. The Web Ontology Language (OWL) Has Explicit Semantics Can therefore be used to capture knowledge in a machine understandable way 10
  10. 10. The Chemical Information Ontology (CHEMINF) • 100+ chemical descriptors • 50+ chemical qualities • Relates descriptors to their specifications, the software that generated them (along with the running parameters, and the algorithms that they implement) • Contributors: Nico Adams, Leonid Chepelev, Michel Dumontier, Janna Hastings, Egon Willighagen, Peter Murray-Rust, Cristoph Steinbeck 11 http://semanticchemistry.googlecode.com
  11. 11. Molecular structure can be represented using a SMILES string, which is a common representation of the chemical graph 12 ball & stick model for caffeine SMILES string for caffeine Cn1cnc2n(C)c(=O)n(C)c(=O)c12
  12. 12. Lipinski Rule of Five • Empirically derived ruleset for druglikeness (4 rules with multiples of 5) – mass of less than 500 Daltons – fewer than 5 hydrogen bond donors – fewer than 10 hydrogen bond acceptors – A partition coefficient value between -5 and 5 • A formal description using OWL:
  13. 13. What we then need are services that will consume SMILES strings and annotate the molecule with the required chemical descriptors 14 then we can reason about whether it satisfies the drug- likeness definition
  14. 14. Semantic Automated Discovery and Integration http://sadiframework.org Mark Wilkinson, UBC Michel Dumontier, Carleton University Christopher Baker, UNB SADI is a framework to create Semantic Web services using OWL classes as service inputs and outputs 15
  15. 15. SADI • OWL classes in SADI are local to individual services – They should uniquely specify the service input and outputs (they exactly have the right restrictions) – one service’s world-view can conflict with another, but a client can use any or all • maximize interoperability by reusing types and relations
  16. 16. Semanticscience Integrated Ontology (SIO) • OWL2 ontology • 800 classes covering basic types (physical, processual, informational) with an emphasis on biological entities • 129 basic relations (mereological, participatory, attribute/quality, spatial, temporal and representational) • axioms can be used by reasoners to generate inferences for consistency checking, classification and answering questions about life science knowledge • embodies emerging ontology design patterns • dereferenceable URIs • searchable in the NCBO bioportal http://semanticscience.org/ontology/sio.owl 17 CASCON: Nov 3, 2010
  17. 17. Create code stubs using the ontology • Publish the ontology to a web-accessible location http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl • Make sure that the class names are resolvable (easy when using the hash notation) http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#smiles-molecule http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#logp-molecule http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#hbdc-molecule http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#hdba-molecule http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#lipinksi-druglike-molecule • Download/checkout the code http://sadiframework.org • Run the code generator – specify the URIs that correspond to input and output types 18
  18. 18. Implement the functionality • Java version – Uses Jena to manipulate the RDF graph – Uses Maven to build from command-line or Eclipse; Invokes Jetty for service testing • Chemistry – We used the Chemistry Development Kit (CDK) to implement 4 services 19
  19. 19. Responds to a GET operation by providing the service description in RDF conforms to Feta (BioMoby, myGrid) 20 curl http://cbrass.biordf.net/logpdc/logpc <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:j.0="http://www.mygrid.org.uk/mygrid-moby-service#" > <rdf:Description rdf:about=""> <j.0:hasServiceDescriptionText>no description</j.0:hasServiceDescriptionText> <j.0:hasServiceNameText rdf:datatype="http://www.w3.org/2001/XMLSchema#string">logpc</j.0:hasServiceNameText> <j.0:hasOperation rdf:resource="#operation"/> <rdf:type rdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#serviceDescription"/> </rdf:Description> <rdf:Description rdf:about="#input"> <j.0:objectType rdf:resource="http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#smilesmolecule"/> <rdf:type rdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#parameter"/> </rdf:Description> <rdf:Description rdf:about="#operation"> <j.0:outputParameter rdf:resource="#output"/> <j.0:inputParameter rdf:resource="#input"/> <rdf:type rdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#operation"/> </rdf:Description> <rdf:Description rdf:about="#output"> <j.0:objectType rdf:resource="http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#alogpsmilesmolecule"/> <rdf:type rdf:resource="http://www.mygrid.org.uk/mygrid-moby-service#parameter"/> </rdf:Description> </rdf:RDF>
  20. 20. Responds to a POST containing service input with a service output in RDF 21 <rdf:Description rdf:about="http://semanticscience.org/sadi/ontology/caffeine.rdf#mdalogp"> <rdf:type rdf:resource="http://semanticscience.org/resource/CHEMINF_000251"/> <j.0:SIO_000300 rdf:datatype="http://www.w3.org/2001/XMLSchema#double">-0.4311000000000006</j.0:SIO_000300> </rdf:Description> <rdf:RDF xmlns="http://semanticscience.org/sadi/ontology/caffeine.rdf#" xmlns:so="http://semanticscience.org/sadi/ontology/lipinskiserviceontology.owl#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:sio="http://semanticscience.org/resource/" xmlns:xsd="http://www.w3.org/2001/XMLSchema#"> <so:smilesmolecule rdf:about="http://semanticscience.org/sadi/ontology/caffeine.rdf#m"> <sio:SIO_000008 rdf:resource = "http://semanticscience.org/sadi/ontology/caffeine.rdf#msmiles"/> </so:smilesmolecule> <sio:CHEMINF_000018 rdf:about = "http://semanticscience.org/sadi/ontology/caffeine.rdf#msmiles"> <sio:SIO_000300 rdf:datatype="xsd:string">Cn1cnc2n(C)c(=O)n(C)c(=O)c12</sio:SIO_000300> </sio:CHEMINF_000018> </rdf:RDF> curl --data @caffeine.rdf http://cbrass.biordf.net/logpdc/logpc
  21. 21. Now what? 22
  22. 22. 23 Semantic Health and Research Environment SHARE is an application that execute (SPARQL) queries as workflows over SADI Services
  23. 23. “Reckoning” dynamic discovery of instances of OWL classes through synthesis and invocation of a Web Service workflow capable of generating data described by the OWL class restrictions, followed by reasoning to classify the data into that ontology 24
  24. 24. ChEBI has (non-SW) data! 25
  25. 25. Bio2RDF provides ChEBI in RDF  26
  26. 26. Bio2RDF is now serving over 40 billion triples of linked biological data 27
  27. 27. Bio2RDF covers the major biological databases 28
  28. 28. Bio2RDF is part of a growing web of linked data 29 “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
  29. 29. something you can lookup or search for with rich descriptions 30
  30. 30. 31 SPARQL is the new cool kid on the query block SQL SPARQL
  31. 31. Query for log p 32
  32. 32. 33
  33. 33. Query: Is caffeine a drug-like molecule? 34
  34. 34. Benefits • Data remains distributed – as the internet was meant to be! • Data is not “exposed” as a SPARQL endpoint – greater provider-control over computational resources • Service invocation is straightforward and matchmaking by reasoning about ontology-based input/output descriptions 35
  35. 35. Summary • Semantic Web technologies offer tantalizing new opportunities to publish, share and query data and services • Bio2RDF provides linked life science data • SADI provides a framework to provide semantic web services • SHARE allows us to simultaneously query and reason about data and services represented using RDF/OWL 36 CASCON: Nov 3, 2010
  36. 36. 37 Acknowledgements This research is supported by The Heart + Stroke Foundation of BC and Yukon, Microsoft Research, The Canadian Institutes of Health Research, The Natural Sciences and Engineering Research Council of Canada and CANARIE. Marc-Alexandre Nolin & Francois Belleau (Bio2RDF) Leo Chepelev (implementing the services) Luke McCarthy (SADI technical support) Mark Wilkinson (vision and leadership) Chris Baker (lipidomics) CHEMINF Group Leo Chepelev Janna Hastings Egon Willighagen Nico Adams
  37. 37. dumontierlab.com michel_dumontier@carleton.ca 38

×