Forensic Biology & Its biological significance.pdf
Antimalarial drug dscovery data disclosure
1. www.guidetopharmacology.org
Open and Closed Antimalarial Drug Discovery:
Comparing data Connectivity gaps
and Disclosure Speed
Dr Christopher Southan, Senior Database Curator, IUPHAR/BPS Guide to
PHARMACOLGY (GtoPdb), University of Edinburgh
BioIT Boston 2016, Wed 6th ´April, Track 11, Open Source Innovations 16:30
1
http://www.slideshare.net/cdsouthan/antimalarial-drug-dscovery-data-disclosure
2. Abstract (will be skipped for presentation)
2
Antimalarial research is the poster child for Open Source Drug Discovery
(OSDD). However many leads compounds still have their origins in
Traditional Closed Drug Discovery (TCDD) and uncertainty remains as to the
differences. To provide an assessment, this work examined 32 recent
antimalarial structures in terms of their PubChem connectivity. Of these, 21
had patent matches, only 23 linked to publications and only 21 had BioAssay
records. Major data connectivity problems included 1) leads not findable by
code name, 2) patents not cited in publications 3) leads not reciprocally
linked to Plasmodium protein targets and pathways 4) name-to-structures
only being declared years after patent disclosure. These issues will be
contrasted with the Sydney University Open Source Malaria approach were
open lab books are used to surface structures (e.g. as Google-findable
InChIKey) and crowdsourced collaboration data close to real time, thereby
shaving years of the discovery phase.
3. Outline
• Introduction to Open Source Drug Discovery (OSDD)
• Differences to Traditional Closed Drug Discovery (TCDD)
• Extracting antimalarial leads from the literature
• Profiling structures in PubChem
• A look into the MMV Pathogen Box
• Introducing Open Source Malaria (OSM)
• Profiling the OSM structure collection
• Speed sharing
• Google searching InChIKeys
• Conclusions
• Open structure sets
• References and questions please
3
4. Introduction
• The OSDD concept is not tied to any particular group
• While antimalarials have become a poster-child for OSDD many leads still
come through TCDD route so boundaries between the two are blurred
• OSDD has become a test bed (e.g. open data sets from GSK and others,
the Medicines for Malaria Ventures (MMV) “Malaria Box” and WIPO
Re:Search IP sharing)
• Sydney Open Source Malaria project (@O_S_M) adheres to OSDD
principles (see PMID 23985301)
• I have donated voluntary support to the OSM team since 2012 (i.e. in
addition to my Guide to PHARMACOLOGY Senior Database Curator job)
• This has focused on structure searching and data surfacing
• I blog on data connectivity in general, and for antimalarials in particular
• The surfacing speed for structures reflect “shades of openness” that will be
discussed
4
5. Open vs closed research routes to new medicines
TCDD
• Proprietary data
• Patent filings
• Leads maybe blinded by code
numbers
• Papers after patents
• No direct submissions to public
databases
• Predominantly commercial
software and databases
• Typically ~10 years R&D
• Still the dominant model
OSDD
• Open ELNs
• No patent filings
• Data surfaced rapidly for sharing
• Open access papers
• Submissions to public databases
• Anyone can contribute
• Crowdsourcing
• Preference for open source
software and public databases
• Potential to shorten research
• Pure OSDD relatively rare
5
6. 6
Recent review of
leads - but
• Link-free zone (except
for references)
• PDF “tomb” with
images for structures
• No chemical
specifications
• No database
identifiers
• No target protein
identifiers
• DDD107498 was
blinded at that time
(no structure)
• I mapped to PubChem
CIDs as a community
service
8. Getting name-to-structure out of primary papers: not trivial
8
• On a good day, MeSH curators will index the lead structures specified in
PubMed and connect them to PubChem
• On a bad day (as in this case), they may record the name but without a link to
a chemical structure
• The code name is still PubChem –ve after a year
9. Curatorial ferreting: DDD107498 structure and patent
9
IUPAC from supp dat > chemicalize.org > PubChem > SureChEMBL > SAR table
10. PubChem profile for 32 antimalarial lead structures
10
http://cdsouthan.blogspot.se/2014/06/getting-into-box-with-some-recent.html
http://cdsouthan.blogspot.se/2015/05/entity-resolution-for-antimalarial.html
22 CIDs collated as Pathogen Box proposals plus 16 structures from the PMID
26000721 review (six in common, see blog posts below)
11. Profile for 114 antimalarial actives from the Pathogen Box
11
http://cdsouthan.blogspot.se/2016/03/a-peek-into-mmv-pathogen-box.html
13. The entire portfolio is open, including new designs
13
https://docs.google.com/spreadsheets/d/1Rvy6OiM291d1GN_cyT6eSw_C3lSuJ1jaR7AJa8hgGsc/edit#gid=510297618
http://www.cheminfo.org/flavor/malaria/Display_data.html
411 molecular records
in March 2016 OSM
master sheet (Mat
Todd et al.) and custom
ELN (Luc Patiny et al.)
14. Rapid triage of the OSM portfolio in PubChem
14
250 identity matches from 410 InChIs uploaded
15. PubChem profile for 250 OSM matches
15
• Note 160 from 410 had no exact matches (e.g. includes design proposals)
• Patents include matches for reference cpds (i.e. not antimalarial claims)
17. Googling the InChIKey for global findability
17
• Direct from Open
Lab Books
• Or from a
chemicalize
conversion
• Search in ~0.3 sec
• Works with inner
layer
• Can cross-check
PubChem < > ELN
18. Getting structures into PubChem is not difficult
18
• As TW2Informatics I deposited MMV670437 in 2013 as a test case
• The bioactivity data was later submitted by OSM > ChEMBL > PubChem (but did
not include the code name)
• Both SIDs were merged into CID 71819647 , thereby linking name > struc > activity
19. Extending connectivity to target and pathway mapping
19
http://www.wikipathwa
ys.org/index.php/Wiki
Pathways
20. Conclusions
• Encouragingly, published output of antimalarial leads is increasing
• However, challenges of curating and mapping are similar to those
encountered by the GtoPdb team for human targets and ligands
• There is a grey zone between TCDD and OSDD and some leads are
patented
• Authors and stakeholders should ensure their SAR is surfaced and
name-to-structure connected in databases (i.e. FAIR principles, see
PMID 26978244)
• Gaps persist in mappings between leads, targets and pathways
• The practice of OSDD by OSM and collaborators accelerates research
• PubChem MyNCBI collections are useful for sharing structure sets
20
21. PubChem MyNCBI open structure sets
• 16 clinical candidates from PMID 26000721
• 22 leads from various sources
• 114 from the Pathogen Box
• 250 from the OSM PubChem matches
n.b. Those engaged in antimalarial research can contact me if they need
technical details and/or possible generation of new lists (e.g. CID subsets or
patent extractions)
21
http://www.ncbi.nlm.nih.gov/sites/myncbi/christopher.southan.1/collections/48460617/public/
http://www.ncbi.nlm.nih.gov/sites/myncbi/christopher.southan.1/collections/49901772/public
http://www.ncbi.nlm.nih.gov/sites/myncbi/christopher.southan.1/collections/49700347/public/
http://www.ncbi.nlm.nih.gov/sites/myncbi/christopher.southan.1/collections/48358242/public/