Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The royal society of chemistry and its adoption of semantic web technologies for chemistry at the epoch of a federated world

820 views

Published on

Semantic web technologies have quickly penetrated all areas of traditional and new database systems and have become the de facto standard in information exchange and communication. The Royal Society of Chemistry has built a new chemistry data repository with the semantic web at the core of the system. Every module of the data repository contains a semantic web layer and is able to interact internally and externally using standard approaches and formats including RDF, appropriate ontologies, SPARQL querying and so on. In this presentation we will review the challenges associated with developing this new system based on semantic web technologies and how the approach that we have taken offers distinct advantages over the original data model designed to produce the ChemSpider database. Its advantages include extensibility, an ontological underpinning, federated integration and the adoption of modern standards rather than the constraints of a standard SQL model.

Published in: Science
  • Be the first to comment

  • Be the first to like this

The royal society of chemistry and its adoption of semantic web technologies for chemistry at the epoch of a federated world

  1. 1. The Royal Society of Chemistry and its adoption of semantic web technologies for chemistry at the epoch of a federated world Antony Williams, Valery Tkachenko, Ken Karapetyan, Alexey Pshenichnov ACS, 248th National Meeting San Francisco, CA August 11th 2014
  2. 2. Who is involved? 29 partners
  3. 3. Research questions
  4. 4. Research questions ChEMBLChEMBL DrugBankDrugBank Gene Ontology Gene Ontology WikipathwaysWikipathways UniProtUniProt ChemSpiderChemSpider UMLSUMLS ConceptWikiConceptWiki ChEBIChEBI TrialTroveTrialTrove GVKBioGVKBio GeneGoGeneGo TR IntegrityTR Integrity “Find me compounds that inhibit targets in NFkB pathway assayed in only functional assays with a potency <1 μM” “What is the selectivity profile of known p38 inhibitors?” “Let me compare MW, logP and PSA for known oxidoreductase inhibitors”
  5. 5. Open PHACTS Explorer Web based searching interface explorer.openphacts.org Discovery Platform Open PHACTS API dev.openphacts.org Applications can query the pharmacological data within Open PHACTS Open PHACTS applications External bespoke applications using the Open PHACTS API. chembionavigator.org pharmatrek.org • Compound-protein interactions • Physicochemical properties Workflow tools Pipeline Pilot, KNIME, R • Gene information • Biological pathways
  6. 6. OpenPHACTS UI http://explorer.openphacts.org/
  7. 7. ChemBioNavigator
  8. 8. OpenPHACTS API https://dev.openphacts.org/ https://dev.openphacts.org/
  9. 9. KNIME
  10. 10. OpenPHACTS Architecture
  11. 11. Micro-article Compounds Reaction Analytical Data Text and References
  12. 12. Technical view - unification
  13. 13. Chemistry Validation and Standardization Platform
  14. 14. DrugBank dataset (6516 records) J. Brechner, IUPAC Graphical Representation of stereochem. configurations Section: ST-1.1.10 DB06287
  15. 15. PubChemDrugbankChemSpider Imatinib Mesylate What Is Gleevec? Ambiguities
  16. 16. How is this a semantic web problem? Why can’t people just be clear? People may be working with faulty data. Salts, say, may make little difference to the effects of an active ingredient. People may assume a one-to-one mapping between a gene and the gene product (protein, ncRNA) that it codes for.
  17. 17. What’s in a lens? Identifier Title (dct:title) Description (dct:description) Documentation link (dcat:landingPage) Creator (pav:createdBy) Timestamp (pav:createdOn) Equivalence rules (bdb:linksetJustification)
  18. 18. Equivalence rules The BridgeDB vocabulary adds metadata that provides a justification for treating two URIs alike, thus allowing the researcher to determine whether their circumstances fit. owl:sameAs ≤ skos:exactMatch ≤ skos:closeMatch ≤ rdfs:seeAlso The ChEBI and CHEMINF ontologies provide a rich set of relations (many of which developed for this project) to relate one molecule to another.
  19. 19. ChEBI (http://www.ebi.ac.uk/chebi) has part is tautomer of CHEMINF (http://code.google.com/p/semanticchemistry/) has component with uncharged counterpart has counterpart molecular entity has normalized counterpart has OPS normalized counterpart has PubChem normalized counterpart has uncharged counterpart similar to similar to by PubChem 2D similarity algorithm similar to by PubChem 3D similarity algorithm has same connectivity as is isotopologue of is stereoisomer of subClassOf (standard relation in RDF) has isotopically unspecified parent has stereoundefined parent
  20. 20. Link: skos:closeMatch Reason: non-salt form Link: skos:exactMatch Reason: drug name
  21. 21. Strict Relaxed Analysing Browsing skos:exactMatch (InChI)
  22. 22. Strict Relaxed Analysing Exploring 23 skos:closeMatch (Drug Name) skos:closeMatch (Drug Name) skos:exactMatch (InChI)
  23. 23. What does the Open PHACTS Chemistry Registration System do? Takes in structures from ChEMBL, ChEBI, DrugBank, PDB, Thomson Reuters. Normalizes structures according to rules based on FDA guidelines. Generates counterpart molecules: without charge, fragments
  24. 24. Chemistry Validation and Standardization Platform
  25. 25. Input pipeline
  26. 26. Compounds domain
  27. 27. Navigation in chemical space
  28. 28. Navigation in chemical space
  29. 29. Reactions domain
  30. 30. Analytical data domain
  31. 31. Crystallography domain
  32. 32. Standards
  33. 33. Share in a “proper way”
  34. 34. APIs, endpoints and widgets
  35. 35. Dimensions and complexity of science
  36. 36. Handling complex content What’s the structure? What’s the structure? Are they in our file? Are they in our file? What’s similar? What’s similar? What’s the target? What’s the target?Pharmacology data? Pharmacology data? Known Pathways? Known Pathways? Working On Now? Working On Now?Connections to disease? Connections to disease? Expressed in right cell type? Expressed in right cell type? Competitors?Competitors? IP?IP?
  37. 37. Machine learning
  38. 38. Thank you Email: tkachenkov@rsc.org Slides: http://www.slideshare.net/valerytkachenko16

×