1. Discussion
Christopher Southan IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb), Centre for Integrative Physiology, The
University of Edinburgh, EH8 9XD, UK. http://www.slideshare.net/cdsouthan/connecting-antimalarial-data
As outlined in the introduction to the CINF Symposium, among Jean-Claude
Bradley’s achievements, his work on Open Notebook Science (ONS)
(https://en.wikipedia.org/wiki/Open_notebook_science) has not only perhaps
the largest impact but the ripple effect continues to broaden. This is
particularly the case in Open Source Drug Discovery, OSDD (used here as a
generic term not specific to any group) where ONS forms a core enablement
for the movement (PMID:23985301). This is a radical departure from what
we can call Traditional Closed Drug Discovery (TCDD). While boundaries
between these camps are blurred, the use of ONS is a clear differentiator in
the philosophy of real-time data surfacing (typically via an Electronic
Laboratory Notebook ELN). This means that teams can intersect with, share
and optimise any chemical space since they are no longer competitively
compelled to IP-protect lead structures. The domain of small-molecule
malaria treatments has become a poster child for OSDD and also spawned
the “Box” concept of physically distributable active compound sets.
Opening up and connecting antimalarial data:
Progress with caveats
An ACS SciMix contribution from the CINF session: The Growing Impact of
Openness in Chemistry: A Symposium in Honour of JC Bradley
Jean-Claude Bradley’s pioneering of ONS has the potential to shorten lead discovery and optimisation by years. Consequently it will bring more new
medicines to more patients faster. This is not restricted to NTDs but is likely to be adopted by rare disease consortia. Notwithstanding, as a proportion of the
current antimalarial chemical estate, the ONS contribution is small. Notably, the majority of lead SAR is still instantiated in patents and papers from the TCDD
motus operandi. This was the reason why curating leads for the PB remained a typically arduous exercise (that we are used to at GtoPdb). It is also important
to note that impediments to findability and connectivity of molecular relationships in the “system” (including target and pathway mapping) remain serious
concerns for malaria and other OSDD domains. In the context of drug discovery ONS, like any other approach, has its caveats. The main one is that real-time
data (hot off the instruments or just out of the fume hood) tends to be unstructured and confirmations pending. In this situation of “positive collaborative
anarchy” across different global teams ONS data can be difficult to find, provenance, verify, curate, standardise and mine. Of course, a similar constellation of
informatics challenges also arises from TCDD but (on a good day) open (e.g. in PubChem direct or curated from the literature via ChEMBL and/or GtoPdb)
SAR may surface in a minable form even if some years after the fact. Notwithstanding, the major acceleration that ONS facilitates will ensure its expansion that
will include new drug discovery commercial gaps with unmet clinical needs, as a fitting legacy of Jean-Claude Bradley’s innovation.
As context for this invited presentation, while my day-job is working for the Edinburgh GtoPdb team, I have donated a small amount of voluntary support to the
Sydney OSM team since 2012 (https://www.thinkable.org/submission/2136 at 1.54 on the video). This has focused mainly on chemical structure searching,
data organisation and surfacing strategies. In addition, I blog occasionally on the themes of data connectivity in general and for antimalarial leads in particular.
For the record, MMV have thanked me for contributing the 28 structures. By various criteria they will not all go in to the PB but I hope to find out the inclusions.
Figure 1. Comparing the Genome Ontology function splits between all human proteins (left, 20,198) and the GtoPdb targets with small-
molecule quantitative interactions (right, 978)
Following on from the award-winning success of the Medicines for
Malaria Ventures (MMV) Malaria Box of 400 compounds
(http://www.mmv.org/malariabox) a Pathogen Box (PB) is in preparation
for a range of Neglected Tropical Diseases (NTDs) in addition to malaria
(http://pathogenbox.org/). Since I had already highlighted the vicissitudes
of establishing the explicit molecular identities of published malaria leads
in several blog posts I extended these to 28 structures for possible
inclusion in the PB (http://cdsouthan.blogspot.se/2014/06/getting-into-
box-with-some-recent.html) the first page of which is shown below.
The challenges of curating leads for the PB were similar to those
encountered by the GtoPdb team for human targets and their ligands on a
daily basis (PMID:24234439). They were in fact somewhat worse, as
reflected in the statistics of the 22 PubChem CIDs linked below
http://www.ncbi.nlm.nih.gov/sites/myncbi/christopher.southan.1/collections/
48358242/public/. Quirks encountered are detailed in the blog post but
included;
• The 6 structures not in PubChem are de facto unfindable in open dbs
but some may get Google InChIKey matches via chemicalize.org cache
• The only systematic identifier encountered was the IUPAC name which
often had to be dug out of the supplementary data as in blog page on
the left (i.e. neither SMILES nor InChI in papers or patents)
• No authors made direct database submissions
• The code name was often not a PubChem synonym
• ChEMBL had picked up 16 with data > to PubChem BioAssay
• 13 had patent-extraction matches and 11 chemical vendor matches
• The MeSH annotation had only linked two directly to PMIDs
Out of the documents and into the BoxIntroduction
RESULTS (3)
Finding structures and linking data from the Sydney University OSM team
and their collaborators (http://opensourcemalaria.org/) is much easier than
for the PB 28. This is primarily because of their adoption of ONS, Google
docs, other surfacing routes and direct submissions to ChEMBL. This is
illustrated for MMV670437 (as an OSM 44 nM lead in the 28) by simply
Googling the inner InChIKey layer (PMID:23399051). Matches (including
below left) returned in 0.35 sec, include PubChem, OSM in GitHub and my
blog. The PubChem SID (below right) with MMV code is my submission.
Further ONS utility is exemplified by the surfacing of 250 project structures
in the first link below. The second link maps 167 of these into PubChem
https://docs.google.com/spreadsheets/d/1Rvy6OiM291d1GN_cyT6eSw_C3lSuJ1jaR7AJa8hgGsc/edit#gid=510297618
http://www.ncbi.nlm.nih.gov/sites/myncbi/christopher.southan.1/collections/48338932/public/
Connecting up with Open Source Malaria (OSM)