The royal society of chemistry and its adoption of semantic web technologies for chemistry at the epoch of a federated world


Published on

Semantic web technologies have quickly penetrated all areas of traditional and new database systems and have become the de facto standard in information exchange and communication. The Royal Society of Chemistry has built a new chemistry data repository with the semantic web at the core of the system. Every module of the data repository contains a semantic web layer and is able to interact internally and externally using standard approaches and formats including RDF, appropriate ontologies, SPARQL querying and so on. In this presentation we will review the challenges associated with developing this new system based on semantic web technologies and how the approach that we have taken offers distinct advantages over the original data model designed to produce the ChemSpider database. Its advantages include extensibility, an ontological underpinning, federated integration and the adoption of modern standards rather than the constraints of a standard SQL model.

Published in: Science
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • sameAs != sameAs depends on your point of view
    Links relate individual data instances: source, target, predicate, reason.
    Links are grouped into Linksets which have VoID header providing provenance and justification for the link.
  • Change to add more database, rearrange
  • ISO 11238
  • That’s the world we live in
  • What about science and chemistry in particular?
  • Using data
  • The royal society of chemistry and its adoption of semantic web technologies for chemistry at the epoch of a federated world

    1. 1. The Royal Society of Chemistry and its adoption of semantic web technologies for chemistry at the epoch of a federated world Antony Williams, Valery Tkachenko, Ken Karapetyan, Alexey Pshenichnov ACS, 248th National Meeting San Francisco, CA August 11th 2014
    2. 2. Who is involved? 29 partners
    3. 3. Research questions
    4. 4. Research questions ChEMBLChEMBL DrugBankDrugBank Gene Ontology Gene Ontology WikipathwaysWikipathways UniProtUniProt ChemSpiderChemSpider UMLSUMLS ConceptWikiConceptWiki ChEBIChEBI TrialTroveTrialTrove GVKBioGVKBio GeneGoGeneGo TR IntegrityTR Integrity “Find me compounds that inhibit targets in NFkB pathway assayed in only functional assays with a potency <1 μM” “What is the selectivity profile of known p38 inhibitors?” “Let me compare MW, logP and PSA for known oxidoreductase inhibitors”
    5. 5. Open PHACTS Explorer Web based searching interface Discovery Platform Open PHACTS API Applications can query the pharmacological data within Open PHACTS Open PHACTS applications External bespoke applications using the Open PHACTS API. • Compound-protein interactions • Physicochemical properties Workflow tools Pipeline Pilot, KNIME, R • Gene information • Biological pathways
    6. 6. OpenPHACTS UI
    7. 7. ChemBioNavigator
    8. 8. OpenPHACTS API
    9. 9. KNIME
    10. 10. OpenPHACTS Architecture
    11. 11. Micro-article Compounds Reaction Analytical Data Text and References
    12. 12. Technical view - unification
    13. 13. Chemistry Validation and Standardization Platform
    14. 14. DrugBank dataset (6516 records) J. Brechner, IUPAC Graphical Representation of stereochem. configurations Section: ST-1.1.10 DB06287
    15. 15. PubChemDrugbankChemSpider Imatinib Mesylate What Is Gleevec? Ambiguities
    16. 16. How is this a semantic web problem? Why can’t people just be clear? People may be working with faulty data. Salts, say, may make little difference to the effects of an active ingredient. People may assume a one-to-one mapping between a gene and the gene product (protein, ncRNA) that it codes for.
    17. 17. What’s in a lens? Identifier Title (dct:title) Description (dct:description) Documentation link (dcat:landingPage) Creator (pav:createdBy) Timestamp (pav:createdOn) Equivalence rules (bdb:linksetJustification)
    18. 18. Equivalence rules The BridgeDB vocabulary adds metadata that provides a justification for treating two URIs alike, thus allowing the researcher to determine whether their circumstances fit. owl:sameAs ≤ skos:exactMatch ≤ skos:closeMatch ≤ rdfs:seeAlso The ChEBI and CHEMINF ontologies provide a rich set of relations (many of which developed for this project) to relate one molecule to another.
    19. 19. ChEBI ( has part is tautomer of CHEMINF ( has component with uncharged counterpart has counterpart molecular entity has normalized counterpart has OPS normalized counterpart has PubChem normalized counterpart has uncharged counterpart similar to similar to by PubChem 2D similarity algorithm similar to by PubChem 3D similarity algorithm has same connectivity as is isotopologue of is stereoisomer of subClassOf (standard relation in RDF) has isotopically unspecified parent has stereoundefined parent
    20. 20. Link: skos:closeMatch Reason: non-salt form Link: skos:exactMatch Reason: drug name
    21. 21. Strict Relaxed Analysing Browsing skos:exactMatch (InChI)
    22. 22. Strict Relaxed Analysing Exploring 23 skos:closeMatch (Drug Name) skos:closeMatch (Drug Name) skos:exactMatch (InChI)
    23. 23. What does the Open PHACTS Chemistry Registration System do? Takes in structures from ChEMBL, ChEBI, DrugBank, PDB, Thomson Reuters. Normalizes structures according to rules based on FDA guidelines. Generates counterpart molecules: without charge, fragments
    24. 24. Chemistry Validation and Standardization Platform
    25. 25. Input pipeline
    26. 26. Compounds domain
    27. 27. Navigation in chemical space
    28. 28. Navigation in chemical space
    29. 29. Reactions domain
    30. 30. Analytical data domain
    31. 31. Crystallography domain
    32. 32. Standards
    33. 33. Share in a “proper way”
    34. 34. APIs, endpoints and widgets
    35. 35. Dimensions and complexity of science
    36. 36. Handling complex content What’s the structure? What’s the structure? Are they in our file? Are they in our file? What’s similar? What’s similar? What’s the target? What’s the target?Pharmacology data? Pharmacology data? Known Pathways? Known Pathways? Working On Now? Working On Now?Connections to disease? Connections to disease? Expressed in right cell type? Expressed in right cell type? Competitors?Competitors? IP?IP?
    37. 37. Machine learning
    38. 38. Thank you Email: Slides: