A talk given at SERMACS 7th Nov 2015 in Memphis, describes CDD Vault, CDD Vision and CDD Models. In addition it also describes how the software is used in large and smaller scale collaborations for drug discovery.
CDD: Vault, CDD: Vision and CDD: Models for Drug Discovery Collaborations
1. CDD: VAULT, CDD: VISION
AND CDD: MODELS FOR
DRUG DISCOVERY
COLLABORATIONS
SEAN EKINS1,2 ANNA COULON-SPEKTOR1, KELLAN
GREGORY1, CHARLIE WEATHERALL1, KRISHNA DOLE1,
ANDREW MCNUTT1, PETER NYBERG1, TOM GILLIGAN1,
XIAO BA1, BARBARA HOLTZ1, SYLVIA ERNST1, FRANK
COLE1, MARC NAVRE1, ALEX M. CLARK3 AND BARRY A.
BUNIN1
1 COLLABORATIVE DRUG DISCOVERY, 1633 BAYSHORE HIGHWAY, SUITE 342, BURLINGAME, CA 94010,
USA; 2 COLLABORATIONS IN CHEMISTRY, 5616 HILLTOP NEEDMORE ROAD, FUQUAY-VARINA, NC
27526, USA; 3 MOLECULAR MATERIALS INFORMATICS, INC., 1900 ST. JACQUES #302, MONTREAL H3J
2S1, QUEBEC, CANADA
3. CDD- Over a decade of drug discovery
collaborations
SaaS
Easy to use
Used by
Academia
Industry,
Biotech
Private
Selective
collaboration
100’s of
published
datasets
4. Enterprise Capabilities
Web Interface, Management Tools, Integration,
Customizable
Drug Discovery Data Mining
Search, Visualization, Presentation
Chemical Intelligence
Chemical Drawing, Registration, Property
Calculators, Structure Search, SAR Tools
Collaborative Environment
Controlled Access, Data Privacy, Security,
Community
Free Public Data Access
Screening Data, Compound Data
CDD Vault Features
5. • Online Zendesk
• CDD Models
• CDD Vision
• Integration of CDD Public,
ChemSpider, Zinc, and
PubChem.
Benefits of CDD Vault
6. Budget Sensitive Startup or Academic Scientists
Won’t lose data
Get better results
Easy to trial, set up, configure, be trained and GO!
ex-Big Pharma Scientist Familiar with Registration/SAR Software
Nimble
Save $$$ with modern cloud solution
Relax – data migration is a snap!
Big Collaborations funded by Pharma, NIH, Foundations (PPP)
Control exactly which data you share with others
Relax – security is built in
Foster interactions between biologists and chemists
Passed Big Pharma & NIH FISMA audits (CDD does not own IP)
The CDD Vault “Value Proposition”
7. • About 7 million to 8 million
people estimated to be
infected worldwide
• Vector-borne transmission
occurs in the Americas.
• A triatomine bug carries the
parasite Trypanosoma
cruzi which causes the
disease.
• The disease is curable if
treatment is initiated soon
after infection.
Hotez et al., PLoS Negl Trop Dis. 2013 Oct 31;7(10):e2300
Chagas Disease
8. • Used public Chagas HTS data from Broad inst.
• Created Machine learning models – validated
• Used to screen multiple datasets of drugs and
natural products
• Selected compounds for testing
• Testing in vitro
• Testing in vivo
Chagas Disease – Machine Learning
9. Comparing Diversity Screening vs Machine learning
SCREENING
HIT SELECTION
HIT CONFIRMATION
(dose-response)
SUITABLE FOR IN
VIVO
Confirmed In Vivo
Efficacy
100,000 cpds
(diversity lib)
2,000 cpds Hits
(~2%)
1,000 cpds Hit-
conf. (~50%)
20 cpds tested in
vivo (~2%)
1 cpd with >80%
efficacy (<5%)
99 cpds
17 cpds Hits
(17%)
14 Hit-conf.
(82%)
5 cpds tested
in vivo (35%)
2 cpd with
>80% efficacy
(40%)
Historical Data CDD-UCSD Project
Ekins et al., PLoS Negl Trop Dis. 2015 Jun 26;9(6):e0003878
10. 7,569 cpds => 99 cpds => 17 hits (5
in nM range)
Infection Treatment Reading
0 1 2 3 4 5
6 7
Pyronaridine Furazolidone Verapamil
Nitrofural Tetrandrine Benznidazole
In vivo efficacy of the 5 tested compounds
Vehicle
Ekins et al., PLoS Negl Trop Dis. 2015 Jun 26;9(6):e0003878
11. Sharing Chagas in vitro and in vivo data in
CDD Vault
Ekins et al., PLoS Negl Trop Dis. 2015 Jun 26;9(6):e0003878
CDD and UCSD
used Vault to
securely share
data
In vitro and in
vivo data
captured
Screening and
dose response
dataWork provided starting
point for a phase II
and phase I grant
(submitted)
12. TB Project overview
Phase I STTR – Proof of concept of mimic strategy
Phase II STTR – Expand mimic strategy and validation of phase I hits
13. streptomycin (1943)
para-aminosalicyclic acid (1949)
isoniazid (1952)
pyrazinamide (1954)
cycloserine (1955)
ethambutol (1962)
rifampicin (1967)
Globally ~$500M in R&D /yr
Multi drug resistance in 4.3% of
cases
Extensively drug resistant
increasing incidence
one new drug (bedaquiline) in 40
yrs
TB key points
14. Tested >350,000 molecules Tested ~2M 2M >300,000
>1500 active and non toxic Published 177 100s 800
Bigger Open Data: Screening for New
Tuberculosis Treatments
How many will become a new drug?
TBDA screened over 1 million, 1 million
more to go
TB Alliance + Japanese pharma screens
R43 LM011152-01
15. Over 8 years analyzed in vitro data and built
models
Top scoring molecules
assayed for
Mtb growth inhibition
Mtb screening
molecule
database/s
High-throughput
phenotypic
Mtb screening
Descriptors + Bioactivity (+Cytotoxicity)
Bayesian Machine Learning classification Mtb Model
Molecule Database
(e.g. GSK malaria
actives)
virtually scored
using Bayesian Models
New bioactivity data
may enhance models
Identify in vitro hits and test models3 x published prospective tests
~750 molecules were tested in
vitro
198 actives were identified
>20 % hit rate
Multiple retrospective tests 3-10
fold enrichment
N
H
S
N
Ekins et al., Pharm Res 31: 414-435, 2014
Ekins, et al., Tuberculosis 94; 162-169, 2014
Ekins, et al., PLOSONE 8; e63240, 2013
Ekins, et al., Chem Biol 20: 370-378, 2013
Ekins, et al., JCIM, 53: 3054−3063, 2013
Ekins and Freundlich, Pharm Res, 28, 1859-1869,
2011
Ekins et al., Mol BioSyst, 6: 840-851, 2010
Ekins, et al., Mol. Biosyst. 6, 2316-2324, 2010,
R43 LM011152-01
17. Examples of CDD Vault used for STTR
Computationally searched >80,000 molecules – and used Bayesian
models for filter - narrowed to 842 hits -tested 23 compounds in
vitro (3 picked as inactives), lead to 2 proposed as mimics of D-
fructose 1,6 bisphosphate
Sarker et al., Pharm Res 2012, 29: 2115-2127
a.
b.
1R41AI088893-01
18. 5 active compounds vs Mtb in a few months
7 tested, 5 active (70% hit rate)
Ekins et al.,Chem
Biol 20, 370–378,
2013
1. Virtually screen
13,533-member GSK
antimalarial hit library
2. Bayesian Model = SRI
TAACF-CB2 dose
response + cytotoxicity
model
3. Top 46 commercially
available compounds
visually inspected
4. 7 compounds chosen
for Mtb testing based
on
- drug-likeness
- chemotype diversity
GSK #
Bayesian
Score Chemical Structure
Mtb H37Rv
MIC
(mg/mL)
GSK
Reported
% Inhibition
HepG2 @ 10
mM cmpd
TCMDC-
123868 5.73 >32 40
TCMDC-
125802 5.63 0.0625 5
TCMDC-
124192 5.27 2.0 4
TCMDC-
124334 5.20 2.0 4
TCMDC-
123856 5.09 1.0 83
TCMDC-
123640 4.66 >32 10
TCMDC-
124922 4.55 1.0 9
R43 LM011152-01
19. • BAS00521003/ TCMDC-125802 reported to be a P.
falciparum lactate dehydrogenase inhibitor
• Only one report of antitubercular activity from 1969
- solid agar MIC = 1 mg/mL (“wild strain”)
- “no activity” in mouse model up to 400 mg/kg
- however, activity was solely judged by
extension of survival!
Bruhin, H. et al., J. Pharm. Pharmac. 1969, 21, 423-433.
.
MIC of 0.0625 ug/mL
• 64X MIC affords 6 logs of
kill
• Resistance and/or drug
instability beyond 14 d
Vero cells : CC50 = 4.0
mg/mL
Selectivity Index SI =
CC50/MICMtb = 16 – 64
In mouse no toxicity but
also no efficacy in GKO
model – probably
metabolized.
Ekins et al.,Chem Biol 20, 370–378, 2013R43 LM011152-01
Taking a compound in vivo identifies issues
20. Optimizing the triazine series as part of this project, improve
solubility and show in vivo efficacy
1U19AI109713-01
22. MM4TB
• Provide CDD Vault
• Vault Support
• Cheminformatics
support to project
• Example using CDD
Vault to share
docking data for
Topo I project
• Dock compounds in
homology model of
Mtb Topo I then
import data in CDD
23. Complete inhibition of
Topo I at 100nM
MIC 60 – 250 uM
MM4TB – Topo I
Godbole et al., Antimicrob Agents Chemother 59:1549-57, 2015.
24. MM4TB – Topo I
• Mtb Topo I docking identified new inhibitors –
collaboration With Nagaraja group in India - Amsacrine
Godbole et al., Biochem
Biophys Res Comm
446:916-20, 2014.
25.
26. CDD VISION
Data taken from CDD Vault and utilized in CDD Vision
Backend formed using immutable and Crossfilter.js,
binding layer uses d3.js and jQuery, Rendering uses
d3.js and Pixi.js
48. MoDELS RESIDE IN PAPERS
NOT ACCESSIBLE…THIS IS
UNDESIRABLE
How do we share them?
How do we use Them?
49. Open Extended Connectivity Fingerprints
ECFP_6 FCFP_6
• Collected,
deduplicated,
hashed
• Sparse integers
• Invented for Pipeline Pilot: public method, proprietary details
• Often used with Bayesian models: many published papers
• Built a new implementation: open source, Java, CDK
– stable: fingerprints don't change with each new toolkit release
– well defined: easy to document precise steps
– easy to port: already migrated to iOS (Objective-C) for TB Mobile app
• Provides core basis feature for CDD open source model service
Clark et al., J Cheminform 6:38 2014
50. Predictions for the InhA target: (a) the ROC curve with ECFP_6 and FCFP_6
fingerprints; (b) modified Bayesian estimators for active and inactive
compounds; (c) structures of selected binders.
For each listed target with at least two binders, it is first assumed that all of the
molecules in the collection that do not indicate this as one of their targets are
inactive.
In the app we used ECFP_6 fingerprints
Building Bayesian models for each target in TB Mobile
Clark et al., J Cheminform 6:38 2014
51. TB Mobile
Ekins et al., J Cheminform 5:13, 2013
Clark et al., J Cheminform 6:38 2014
Predict targets
Cluster molecules
http://goo.gl/vPOKS
http://goo.gl/iDJFR
52. Single point data > 300K molecules
Uses Bayesian algorithm and FCFP_6 fingerprints
Clark et al., J Cheminform 6:38 2014
59. Summary
• Accessible software
• Used widely in academia and
industry
• Leader in collaboration and security
• Grown steadily through sales and
grants
• Dedicated sales in Europe, Asia
• Coming soon: ELN
• CDD provides integrated software
for drug discovery
60. Jair Lage de Siqueira-Neto
Joel Freundlich
Peter Madrid
Robert Reynolds
Carolyn Talcott
Malabika Sarker
EU FP7 funding MM4TB
NIH NIAID
NIH NLM
NIH NCATS
Bill and Melinda Gates Foundation (Grant#49852)
sean.ekins
ekinssean@yahoo.com
collabchem
Acknowledgments and contact info