Enhancing High Throughput Screening For Mycobacterium  tuberculosis Drug Discovery Using Bayesian Models    Sean Ekins1, 2...
Applying CDD to Build a disease community for TB   Tuberculosis kills 1.6-1.7m/yr (~1 every 8 seconds)   1/3rd of worlds...
~ 20 public datasets for TBIncluding Novartis data on TB hits>300,000 cpdsPatents, Papers Annotated by CDDOpen to browse b...
Fitting into the drug discovery                  processEkins et al,Trends inMicrobiology19: 65-74, 2011
HTS Hit rates                        SRI papers Usually less than 1%
UIC hit rates                                         Inhibitor             Compound   Number of                          ...
Wasting data?Information from these inefficient and expensiveHTS campaigns does not appear to have beenused to direct “inf...
Bayesian machine learningBayesian classification is a simple probabilistic classification model. It is based onBayes’ theo...
Process – Bioactivity onlyHigh-throughput   Mtb screening                       Bayesian Machine Learning Mtb Model  pheno...
Bayesian Classification TB Models     We can use the public data for machine learning model building     Using Discovery S...
Bayesian Classification Models for TB      Laplacian-corrected Bayesian classifier models were generated using FCFP-6 and ...
Bayesian Classification Dose responseGoodBad                      Ekins et al., Mol BioSyst, 6: 840-851, 2010
Initial testing of Mtb Bayesian models using NIAID and                       GVKbio data                               Bot...
Additional test sets 1702 hits in >100K cpds          34 hits in 248 cpds              21 hits in 2108 cpds 100K library  ...
Dual-Event modelsBecome more stringent in what we call anACTIVEIC90 < 10 uM and a selectivity index (SI)greater than ten. ...
Dual-Event modelsHigh-throughput   Mtb screening                       Bayesian Machine Learning Mtb Model  phenotypic    ...
Bayesian Classification TB Models    Single pt ROC XV AUC           = 0.88    Dose resp                      = 0.78    Dos...
MLSMR dual event modelGoodbad                     Ekins et al., PLOSONE, in press 2013
A new dataset to model
Models with SRI kinase data    Model 1 ROC XV AUC (N 23797) = 0.89    Model 2     (N 1248)         = 0.72    Model 3     (...
Testing to date has been retrospective      Can we use our models to select compounds      and influence design?      Pros...
Testing prospectivelyMLSMR dose response with cytotoxicity and theTAACF kinase dose response with cytotoxicitymodels were ...
Results - Asinex library94 molecules selected with the MLSMR dose response andcytotoxicity model88 with the library based ...
Asinex and MLSMR actives PCAEkins et al., PLOSONE, in press 2013
Examples ofselective and active compounds with  MIC <10 ug/ml
An example of the model ranking similar              compoundsMaybridge   Structure   Inhibition %   Inhibition %    MIC  ...
Analysis of SelleckChem Kinase library N=194                                        47 molecules greater                  ...
Kinase inhibitors active vs Mtb            SI not ideal            – several other weaker actives are approved drugs
A summary of the numbers involved – filtering for hits.82,403 molecules screened through Bayesian models550 molecules were...
ConclusionsStill difficult to identify molecules with bioactivity and nocytotoxicityModels perform differently on differen...
Acknowledgments   The project described was supported by Award Number R43 LM011152-01    “Biocomputation across distribut...
You can find me @...                                               CDD Booth 205PAPER ID: 13433PAPER TITLE: “Dispensing pr...
Upcoming SlideShare
Loading in …5
×

Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

558 views

Published on

ACS talk 2013

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
558
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • CDD Experienced Team Innovates and Executes Barry Bunin, PhD (Pres. &amp; Cofounder as first Eli Lilly EIR) Libraria (CEO, Pres.-CSO), Arris Pharmaceuticals (Sr. Scientist), Genentech, UC Berkeley (Ellman), Columbia University, author. Moses Hohman, PhD (Director Software Engineering) Northwestern Assoc. Director of Bioinformatics, Thoughtworks, Inc., U of Chicago (PhD), Harvard ( magna cum laude, Physics) Sylvia Ernst, PhD (Director Community Growth &amp; Sales) Left 800-lb Gorillas: Accelrys-Scitegic, MDL-Elsevier-Beilstein Peter Cohan (BOD &amp; Overall Sales Strategy) Symyx (VP Bus Dev &amp; President-Discovery Tools), MDL (VP Customer Marketing), www.secondderivative.com, author. Omidyar Network, Founders Fund, &amp; Lilly (BOD observers) WSGR (Corporate Counsel), Rina Accountancy (GAAP compliance) Partners: Hub Consortium Members, ChemAxon, DNDi, MMV, Sandler Center… CDD SAB: Christopher Lipinski PhD, James McKerrow, MD PhD, David Roos PhD, Adam Renslo PhD, Wes Van Voorhis, MD PhD
  • CDD Experienced Team Innovates and Executes Barry Bunin, PhD (Pres. &amp; Cofounder as first Eli Lilly EIR) Libraria (CEO, Pres.-CSO), Arris Pharmaceuticals (Sr. Scientist), Genentech, UC Berkeley (Ellman), Columbia University, author. Moses Hohman, PhD (Director Software Engineering) Northwestern Assoc. Director of Bioinformatics, Thoughtworks, Inc., U of Chicago (PhD), Harvard ( magna cum laude, Physics) Sylvia Ernst, PhD (Director Community Growth &amp; Sales) Left 800-lb Gorillas: Accelrys-Scitegic, MDL-Elsevier-Beilstein Peter Cohan (BOD &amp; Overall Sales Strategy) Symyx (VP Bus Dev &amp; President-Discovery Tools), MDL (VP Customer Marketing), www.secondderivative.com, author. Omidyar Network, Founders Fund, &amp; Lilly (BOD observers) WSGR (Corporate Counsel), Rina Accountancy (GAAP compliance) Partners: Hub Consortium Members, ChemAxon, DNDi, MMV, Sandler Center… CDD SAB: Christopher Lipinski PhD, James McKerrow, MD PhD, David Roos PhD, Adam Renslo PhD, Wes Van Voorhis, MD PhD
  • CDD Experienced Team Innovates and Executes Barry Bunin, PhD (Pres. &amp; Cofounder as first Eli Lilly EIR) Libraria (CEO, Pres.-CSO), Arris Pharmaceuticals (Sr. Scientist), Genentech, UC Berkeley (Ellman), Columbia University, author. Moses Hohman, PhD (Director Software Engineering) Northwestern Assoc. Director of Bioinformatics, Thoughtworks, Inc., U of Chicago (PhD), Harvard ( magna cum laude, Physics) Sylvia Ernst, PhD (Director Community Growth &amp; Sales) Left 800-lb Gorillas: Accelrys-Scitegic, MDL-Elsevier-Beilstein Peter Cohan (BOD &amp; Overall Sales Strategy) Symyx (VP Bus Dev &amp; President-Discovery Tools), MDL (VP Customer Marketing), www.secondderivative.com, author. Omidyar Network, Founders Fund, &amp; Lilly (BOD observers) WSGR (Corporate Counsel), Rina Accountancy (GAAP compliance) Partners: Hub Consortium Members, ChemAxon, DNDi, MMV, Sandler Center… CDD SAB: Christopher Lipinski PhD, James McKerrow, MD PhD, David Roos PhD, Adam Renslo PhD, Wes Van Voorhis, MD PhD
  • CDD Experienced Team Innovates and Executes Barry Bunin, PhD (Pres. &amp; Cofounder as first Eli Lilly EIR) Libraria (CEO, Pres.-CSO), Arris Pharmaceuticals (Sr. Scientist), Genentech, UC Berkeley (Ellman), Columbia University, author. Moses Hohman, PhD (Director Software Engineering) Northwestern Assoc. Director of Bioinformatics, Thoughtworks, Inc., U of Chicago (PhD), Harvard ( magna cum laude, Physics) Sylvia Ernst, PhD (Director Community Growth &amp; Sales) Left 800-lb Gorillas: Accelrys-Scitegic, MDL-Elsevier-Beilstein Peter Cohan (BOD &amp; Overall Sales Strategy) Symyx (VP Bus Dev &amp; President-Discovery Tools), MDL (VP Customer Marketing), www.secondderivative.com, author. Omidyar Network, Founders Fund, &amp; Lilly (BOD observers) WSGR (Corporate Counsel), Rina Accountancy (GAAP compliance) Partners: Hub Consortium Members, ChemAxon, DNDi, MMV, Sandler Center… CDD SAB: Christopher Lipinski PhD, James McKerrow, MD PhD, David Roos PhD, Adam Renslo PhD, Wes Van Voorhis, MD PhD
  • Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

    1. 1. Enhancing High Throughput Screening For Mycobacterium tuberculosis Drug Discovery Using Bayesian Models Sean Ekins1, 2*, Robert C. Reynolds3,4, Baojie Wan5 Scott G. Franzblau5, Joel S. Freundlich6,7and Barry A. Bunin11 Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA.2 Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA.3 Southern Research Institute, 2000 Ninth Avenue South, Birmingham, AL 35205, USA.4 Current address: University of Alabama at Birmingham, College of Arts and Sciences , Department of Chemistry, 1530 3 rdAvenue South, Birmingham, Alabama 35294-1240, USA.5 Institute for Tuberculosis Research, University of Illinois at Chicago, Chicago, IL 60607, USA6 Department of Medicine, Center for Emerging and Reemerging Pathogens, UMDNJ – New Jersey Medical School, 185South Orange Avenue Newark, NJ 07103, USA.7 Department of Pharmacology & Physiology, UMDNJ – New Jersey Medical School, 185 South Orange Avenue Newark, NJ07103, USA. .
    2. 2. Applying CDD to Build a disease community for TB Tuberculosis kills 1.6-1.7m/yr (~1 every 8 seconds) 1/3rd of worlds population infected!!!! Multi drug resistance in 4.3% of cases Extensively drug resistant increasing incidence One new drugs in over 40 yrs Drug-drug interactions and Co-morbidity with HIV Collaboration between groups is rare These groups may work on existing or new targets Use of computational methods with TB is rare
    3. 3. ~ 20 public datasets for TBIncluding Novartis data on TB hits>300,000 cpdsPatents, Papers Annotated by CDDOpen to browse by anyone http://www.collaborativedrug. com/register
    4. 4. Fitting into the drug discovery processEkins et al,Trends inMicrobiology19: 65-74, 2011
    5. 5. HTS Hit rates SRI papers Usually less than 1%
    6. 6. UIC hit rates Inhibitor Compound Number of Hit rate (%) at 90% Provider concentration (ug/ml Readout Library compounds Inhibition or uM) LuminescenceChemBridge Novacore 50,000 30 uM 4.55 (LuxAB) Luminescence Asinex Diverse 59,760 50 uM 1.91 (LuxAB) Luminescence ASDI 6,811 30 uM 2.73 (LuxAB) Prestwick 1,120 20 ug/ml Luminescence (ATP) 20.6 Fluorescence (MABA) 16.07 Luminescence MRCT 100,000 10 uM 0.67 (LuxABCDE)
    7. 7. Wasting data?Information from these inefficient and expensiveHTS campaigns does not appear to have beenused to direct “informed” selection of newlibraries in subsequent screens and compoundoptimization in TB drug discoveryHow can we continuously learn from all thedata?
    8. 8. Bayesian machine learningBayesian classification is a simple probabilistic classification model. It is based onBayes’ theoremh is the hypothesis or modeld is the observed datap(h) is the prior belief (probability of hypothesis h before observing any data)p(d) is the data evidence (marginal probability of the data)p(d|h) is the likelihood (probability of data d if hypothesis h is true)p(h|d) is the posterior probability (probability of hypothesis h being true given theobserved data d)A weight is calculated for each feature using a Laplacian-adjusted probabilityestimate to account for the different sampling frequencies of different features.The weights are summed to provide a probability estimateEkins, Williams and Xu, Drug Metab Dispos 38: 2302-2308, 2010
    9. 9. Process – Bioactivity onlyHigh-throughput Mtb screening Bayesian Machine Learning Mtb Model phenotypic molecule Mtb screening database S Descriptors + Bioactivity N H N Molecule Database (e.g. GSK malaria actives) virtually scored using Bayesian Models Top scoring molecules assayed for New bioactivity data Mtb growth inhibition may enhance models S Identify in vitro hits N H N Increased hit/lead discovery efficiency
    10. 10. Bayesian Classification TB Models We can use the public data for machine learning model building Using Discovery Studio Bayesian model Leave out 50% x 100 Dateset Internal (number of External ROC molecules) ROC Score Score Concordance Specificity Sensitivity MLSMR All single point screen (N = 220463) 0.86 ± 0 0.86 ± 0 78.56 ± 1.86 78.59 ± 1.94 77.13 ± 2.26 MLSMRdose response set (N = 2273) 0.73 ± 0.01 0.75 ± 0.01 66.85 ± 4.06 67.21 ± 7.05 65.47 ± 7.96 Ekins et al., Mol BioSyst, 6: 840-851, 2010
    11. 11. Bayesian Classification Models for TB Laplacian-corrected Bayesian classifier models were generated using FCFP-6 and simple descriptors. 2 models 220,000 and >2000 compounds active compounds with MIC < 5uMGood G1: 1704324327 G2: -2092491099 G3: -1230843627 G4: 940811929 G5: 563485513 73 out of 165 good 57 out of 120 good 75 out of 188 good 35 out of 65 good 123 out of 357 good Bayesian Score: 2.885 Bayesian Score: 2.873 Bayesian Score: 2.811 Bayesian Score: 2.780 Bayesian Score: 2.769Bad B1: 1444982751 B2: 274564616 B3: -1775057221 B4: 48625803 B5: 899570811 0 out of 1158 good 0 out of 1024 good 0 out of 982 good 0 out of 740 good 0 out of 738 good Bayesian Score: -3.135 Bayesian Score: -3.018 Bayesian Score: -2.978 Bayesian Score: -2.712 Bayesian Score: -2.709 Ekins et al., Mol BioSyst, 6: 840-851, 2010
    12. 12. Bayesian Classification Dose responseGoodBad Ekins et al., Mol BioSyst, 6: 840-851, 2010
    13. 13. Initial testing of Mtb Bayesian models using NIAID and GVKbio data Both models substantially better than the random hit rate for identifying known active compounds with MIC 5 uM in the first 1000 compounds sorted by the Bayesian model scores The number of active compounds was substantially larger in the NIAID dataset (1871 out of 3748) versus the GVKbio dataset (377 out of 2880), Ekins et al., Mol BioSyst, 6: 840-851, 2010
    14. 14. Additional test sets 1702 hits in >100K cpds 34 hits in 248 cpds 21 hits in 2108 cpds 100K library Novartis Data FDA drugsSuggests models can predict data from the same and independent labsEnrichments 4-10 foldInitial enrichment – enables screening few compounds to find activesEkins et al., Mol BioSyst, 6: 840-851, 2010 Ekins and Freundlich, Pharm Res, 28, 1859-1869, 2011.
    15. 15. Dual-Event modelsBecome more stringent in what we call anACTIVEIC90 < 10 uM and a selectivity index (SI)greater than ten. SI was calculated as SI =CC50/IC90 where CC50 is the concentrationthat resulted in 50% inhibition of Vero cells(CC50).
    16. 16. Dual-Event modelsHigh-throughput Mtb screening Bayesian Machine Learning Mtb Model phenotypic molecule Mtb screening database S Descriptors + Bioactivity (+Cytotoxicity) N H N Molecule Database (e.g. GSK malaria actives) virtually scored using Bayesian Models Top scoring molecules assayed for New bioactivity data Mtb growth inhibition may enhance models S Identify in vitro hits N H N Increased hit/lead discovery efficiency
    17. 17. Bayesian Classification TB Models Single pt ROC XV AUC = 0.88 Dose resp = 0.78 Dose resp + cyto = 0.86 Dateset External Internal (number of ROC ROC molecules) Score Score Concordance Specificity Sensitivity MLSMR All single point screen (N = 220463) 0.86 ± 0 0.86 ± 0 78.56 ± 1.86 78.59 ± 1.94 77.13 ± 2.26 MLSMRdose response set (N = 2273) 0.73 ± 0.01 0.75 ± 0.01 66.85 ± 4.06 67.21 ± 7.05 65.47 ± 7.96NEW Dose resp and cytotoxicity (N = 2273) 0.82 ± 0.02 0.84 ± 0.02 82.61 ± 4.68 83.91 ± 5.48 65.99 ± 7.47 Ekins et al., PLOSONE, in press 2013
    18. 18. MLSMR dual event modelGoodbad Ekins et al., PLOSONE, in press 2013
    19. 19. A new dataset to model
    20. 20. Models with SRI kinase data Model 1 ROC XV AUC (N 23797) = 0.89 Model 2 (N 1248) = 0.72 Model 3 (N 1248) = 0.77 Leave out 50% x 100 Dateset Internal(number of External ROCmolecules) ROC Score Score Concordance Specificity Sensitivity Model 1(N = 23797) 0.87 ± 0 0.88 ± 0 76.77 ± 2.14 76.49 ± 2.41 81.7 ± 2.96 Model 2(N = 1248) 0.65 ± 0.01 0.70 ± 0.01 61.58 ± 1.56 61.85 ± 8.45 61.30 ± 8.24 Model 3 64.84 ± (N=1248) 0.74 ± 0.02 0.75 ± 0.02 68.67 ± 6.88 69.28 ± 9.84 12.11 Ekins et al., PLOSONE, in press 2013
    21. 21. Testing to date has been retrospective Can we use our models to select compounds and influence design? Prospective prediction Do it enough times to show robustness
    22. 22. Testing prospectivelyMLSMR dose response with cytotoxicity and theTAACF kinase dose response with cytotoxicitymodels were used to screen theAsinex library (N = 25,008)Maybridge library (N = 57,200)Selleck Chemicals kinase library (N = 194)
    23. 23. Results - Asinex library94 molecules selected with the MLSMR dose response andcytotoxicity model88 with the library based on kinase inhibitor scaffolds withcytotoxicity model and were tested at a singleconcentration.8 (MLSMR) and 19 hits (kinase) with > 90% inhibition at100 ug/ml (8.5% and 21.5% hit rates) Results - Maybridge library50 molecules had greater than or equal to 90% inhibition at100 ug/ml (28.7% hit rate) - 8 with good SI Ekins et al., PLOSONE, in press 2013
    24. 24. Asinex and MLSMR actives PCAEkins et al., PLOSONE, in press 2013
    25. 25. Examples ofselective and active compounds with MIC <10 ug/ml
    26. 26. An example of the model ranking similar compoundsMaybridge Structure Inhibition % Inhibition % MIC MIC LORA CC50 Vero MLSMR Kinase number MABA at 100 LORA at 100 MABA (µg g/ml) (µg g/ml) model model µg g/ml µg g/ml (µg/ml) score scoreJFD02381 98.9 95 5.84 10.09 >100 25.27 12.79 (0.80) (0.5)JFD02382 91.5 90.1 > 100 47.99 >100 18.32 9.78 (0.69) (0.43)
    27. 27. Analysis of SelleckChem Kinase library N=194 47 molecules greater than or equal to 90% inhibition of M. tuberculosis activity, at 100ug/ml hit rate of 24.2%. Note best model was another dual activity model (Ekins et al., Chem Biol 20: 370-378, 2013)Ekins et al., PLOSONE, in press 2013
    28. 28. Kinase inhibitors active vs Mtb SI not ideal – several other weaker actives are approved drugs
    29. 29. A summary of the numbers involved – filtering for hits.82,403 molecules screened through Bayesian models550 molecules were tested in vitro124 actives were identified22.5 % hit rateIdentified several novel potent lead series with good cytotoxicity& selectivityIdentified known human kinase inhibitors and FDA approveddrugs as new hits
    30. 30. ConclusionsStill difficult to identify molecules with bioactivity and nocytotoxicityModels perform differently on different data setsNeed to understand what factors are keyHit rate much higher than HTS / screen a fraction ofmoleculesComputational models should be used prior to HTSFocus resources
    31. 31. Acknowledgments The project described was supported by Award Number R43 LM011152-01 “Biocomputation across distributed private datasets to enhance drug discovery” from the National Library of Medicine (PI: S. Ekins) Accelrys The CDD TB has been developed thanks to funding from the Bill and Melinda Gates Foundation (Grant#49852 “Collaborative drug discovery for TB through a novel database of SAR data optimized to promote data archiving and sharing”) Allen Casey (IDRI)
    32. 32. You can find me @... CDD Booth 205PAPER ID: 13433PAPER TITLE: “Dispensing processes profoundly impact biological assays and computational andstatistical analyses”April 8th 8.35am Room 349PAPER ID: 14750PAPER TITLE: “Enhancing High Throughput Screening For Mycobacterium tuberculosis Drug DiscoveryUsing Bayesian Models”April 9th 1.30pm Room 353PAPER ID: 21524PAPER TITLE: “Navigating between patents, papers, abstracts and databases using public sources andtools”April 9th 3.50pm Room 350PAPER ID: 13358PAPER TITLE: “TB Mobile: Appifying Data on Anti-tuberculosis Molecule Targets”April 10th 8.30am Room 357PAPER ID: 13382PAPER TITLE: “Challenges and recommendations for obtaining chemical structures of industry-providedrepurposing candidates”April 10th 10.20am Room 350PAPER ID: 13438PAPER TITLE: “Dual-event machine learning models to accelerate drug discovery”April 10th 3.05 pm Room 350

    ×