Semantic web technologies have quickly penetrated all areas of traditional and new database systems and have become the de facto standard in information exchange and communication. The Royal Society of Chemistry has built a new chemistry data repository with the semantic web at the core of the system. Every module of the data repository contains a semantic web layer and is able to interact internally and externally using standard approaches and formats including RDF, appropriate ontologies, SPARQL querying and so on. In this presentation we will review the challenges associated with developing this new system based on semantic web technologies and how the approach that we have taken offers distinct advantages over the original data model designed to produce the ChemSpider database. Its advantages include extensibility, an ontological underpinning, federated integration and the adoption of modern standards rather than the constraints of a standard SQL model.
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
The royal society of chemistry and its adoption of semantic web technologies for chemistry at the epoch of a federated world
1. The Royal Society of Chemistry and its
adoption of semantic web technologies for
chemistry at the epoch of a federated world
Antony Williams, Valery Tkachenko, Ken Karapetyan,
Alexey Pshenichnov
ACS, 248th National Meeting
San Francisco, CA
August 11th
2014
4. Research questions
ChEMBLChEMBL DrugBankDrugBank Gene
Ontology
Gene
Ontology WikipathwaysWikipathways
UniProtUniProt
ChemSpiderChemSpider
UMLSUMLS
ConceptWikiConceptWiki
ChEBIChEBI
TrialTroveTrialTrove
GVKBioGVKBio
GeneGoGeneGo
TR IntegrityTR Integrity
“Find me compounds
that inhibit targets in
NFkB pathway assayed
in only functional assays
with a potency <1 μM”
“What is the
selectivity profile of
known p38
inhibitors?”
“Let me compare
MW, logP and PSA
for known
oxidoreductase
inhibitors”
5. Open PHACTS Explorer
Web based searching interface
explorer.openphacts.org
Discovery Platform
Open PHACTS API dev.openphacts.org
Applications can query the pharmacological data within Open PHACTS
Open PHACTS applications
External bespoke applications
using the Open PHACTS API.
chembionavigator.org
pharmatrek.org
• Compound-protein interactions
• Physicochemical properties
Workflow tools
Pipeline Pilot, KNIME, R
• Gene information
• Biological pathways
17. How is this a semantic web problem? Why can’t
people just be clear?
People may be working with faulty data.
Salts, say, may make little difference to the effects of
an active ingredient.
People may assume a one-to-one mapping between a
gene and the gene product (protein, ncRNA) that it
codes for.
18. What’s in a lens?
Identifier
Title (dct:title)
Description (dct:description)
Documentation link (dcat:landingPage)
Creator (pav:createdBy)
Timestamp (pav:createdOn)
Equivalence rules (bdb:linksetJustification)
19. Equivalence rules
The BridgeDB vocabulary adds metadata that provides a
justification for treating two URIs alike, thus allowing the
researcher to determine whether their circumstances fit.
owl:sameAs ≤ skos:exactMatch ≤ skos:closeMatch ≤
rdfs:seeAlso
The ChEBI and CHEMINF ontologies provide a rich set of
relations (many of which developed for this project) to
relate one molecule to another.
20. ChEBI (http://www.ebi.ac.uk/chebi)
has part
is tautomer of
CHEMINF (http://code.google.com/p/semanticchemistry/)
has component with uncharged counterpart
has counterpart molecular entity
has normalized counterpart
has OPS normalized counterpart
has PubChem normalized counterpart
has uncharged counterpart
similar to
similar to by PubChem 2D similarity algorithm
similar to by PubChem 3D similarity algorithm
has same connectivity as
is isotopologue of
is stereoisomer of
subClassOf (standard relation in RDF)
has isotopically unspecified parent
has stereoundefined parent
24. What does the Open PHACTS Chemistry
Registration System do?
Takes in structures from ChEMBL, ChEBI,
DrugBank, PDB, Thomson Reuters.
Normalizes structures according to rules based on
FDA guidelines.
Generates counterpart molecules: without charge,
fragments
39. Handling complex content
What’s the
structure?
What’s the
structure?
Are they in
our file?
Are they in
our file?
What’s
similar?
What’s
similar?
What’s the
target?
What’s the
target?Pharmacology
data?
Pharmacology
data?
Known
Pathways?
Known
Pathways?
Working On
Now?
Working On
Now?Connections
to disease?
Connections
to disease?
Expressed in
right cell type?
Expressed in
right cell type?
Competitors?Competitors?
IP?IP?
sameAs != sameAs depends on your point of view
Links relate individual data instances: source, target, predicate, reason.
Links are grouped into Linksets which have VoID header providing provenance and justification for the link.