Enhancing High Throughput Screening For Mycobacterium
  tuberculosis Drug Discovery Using Bayesian Models



    Sean Ekins1, 2*, Robert C. Reynolds3,4, Baojie Wan5 Scott G. Franzblau5,
                     Joel S. Freundlich6,7and Barry A. Bunin1
1
 Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA.
2
 Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA.
3
 Southern Research Institute, 2000 Ninth Avenue South, Birmingham, AL 35205, USA.
4
 Current address: University of Alabama at Birmingham, College of Arts and Sciences , Department of Chemistry, 1530 3 rd
Avenue South, Birmingham, Alabama 35294-1240, USA.
5
  Institute for Tuberculosis Research, University of Illinois at Chicago, Chicago, IL 60607, USA
6
 Department of Medicine, Center for Emerging and Reemerging Pathogens, UMDNJ – New Jersey Medical School, 185
South Orange Avenue Newark, NJ 07103, USA.
7
 Department of Pharmacology & Physiology, UMDNJ – New Jersey Medical School, 185 South Orange Avenue Newark, NJ
07103, USA.
                                                                   .
Applying CDD to Build a disease community for TB

   Tuberculosis kills 1.6-1.7m/yr (~1 every 8 seconds)
   1/3rd of worlds population infected!!!!

   Multi drug resistance in 4.3% of cases
   Extensively drug resistant increasing incidence
   One new drugs in over 40 yrs
   Drug-drug interactions and Co-morbidity with HIV

   Collaboration between groups is rare
   These groups may work on existing or new targets
   Use of computational methods with TB is rare
~ 20 public datasets for TB
Including Novartis data on TB hits
>300,000 cpds

Patents, Papers Annotated by CDD

Open to browse by anyone

 http://www.collaborativedrug.
 com/register
Fitting into the drug discovery
                  process




Ekins et al,
Trends in
Microbiology
19: 65-74, 2011
HTS Hit rates




                        SRI papers




 Usually less than 1%
UIC hit rates
                                         Inhibitor
             Compound   Number of                                                Hit rate (%) at 90%
 Provider                           concentration (ug/ml         Readout
              Library   compounds                                                    Inhibition
                                          or uM)

                                                              Luminescence
ChemBridge   Novacore    50,000           30 uM                                         4.55
                                                                 (LuxAB)

                                                              Luminescence
  Asinex     Diverse     59,760           50 uM                                         1.91
                                                                 (LuxAB)

                                                              Luminescence
   ASDI                   6,811           30 uM                                         2.73
                                                                 (LuxAB)


 Prestwick                1,120          20 ug/ml          Luminescence (ATP)           20.6


                                                           Fluorescence (MABA)         16.07


                                                              Luminescence
  MRCT                   100,000          10 uM                                         0.67
                                                              (LuxABCDE)
Wasting data?


Information from these inefficient and expensive
HTS campaigns does not appear to have been
used to direct “informed” selection of new
libraries in subsequent screens and compound
optimization in TB drug discovery




How can we continuously learn from all the
data?
Bayesian machine learning

Bayesian classification is a simple probabilistic classification model. It is based on
Bayes’ theorem




h is the hypothesis or model
d is the observed data
p(h) is the prior belief (probability of hypothesis h before observing any data)
p(d) is the data evidence (marginal probability of the data)
p(d|h) is the likelihood (probability of data d if hypothesis h is true)
p(h|d) is the posterior probability (probability of hypothesis h being true given the
observed data d)

A weight is calculated for each feature using a Laplacian-adjusted probability
estimate to account for the different sampling frequencies of different features.

The weights are summed to provide a probability estimate

Ekins, Williams and Xu, Drug Metab Dispos 38: 2302-2308, 2010
Process – Bioactivity only
High-throughput   Mtb screening                       Bayesian Machine Learning Mtb Model
  phenotypic        molecule
 Mtb screening      database

                          S                                    Descriptors + Bioactivity

                     N
                     H
                              N




                                    Molecule Database
                                 (e.g. GSK malaria actives)
                         virtually scored using Bayesian Models



                              Top scoring molecules assayed for                      New bioactivity data
                                    Mtb growth inhibition                            may enhance models




                                                                             S

                                      Identify in vitro hits             N
                                                                         H
                                                                                 N




                                  Increased hit/lead discovery efficiency
Bayesian Classification TB Models

     We can use the public data for machine learning model building
     Using Discovery Studio Bayesian model
     Leave out 50% x 100




     Dateset                       Internal
   (number of        External        ROC
   molecules)       ROC Score       Score        Concordance         Specificity    Sensitivity
     MLSMR
 All single point
     screen
 (N = 220463)        0.86 ± 0      0.86 ± 0       78.56 ± 1.86      78.59 ± 1.94    77.13 ± 2.26
    MLSMR
dose response set
   (N = 2273)       0.73 ± 0.01   0.75 ± 0.01     66.85 ± 4.06      67.21 ± 7.05    65.47 ± 7.96

                                                Ekins et al., Mol BioSyst, 6: 840-851, 2010
Bayesian Classification Models for TB

      Laplacian-corrected Bayesian classifier models were generated using FCFP-6 and
      simple descriptors. 2 models 220,000 and >2000 compounds
                                    active compounds with MIC < 5uM


Good



           G1: 1704324327             G2: -2092491099            G3: -1230843627            G4: 940811929            G5: 563485513
          73 out of 165 good         57 out of 120 good         75 out of 188 good         35 out of 65 good       123 out of 357 good
         Bayesian Score: 2.885      Bayesian Score: 2.873      Bayesian Score: 2.811     Bayesian Score: 2.780    Bayesian Score: 2.769




Bad




              B1: 1444982751              B2: 274564616           B3: -1775057221            B4: 48625803            B5: 899570811
             0 out of 1158 good         0 out of 1024 good        0 out of 982 good        0 out of 740 good        0 out of 738 good
           Bayesian Score: -3.135     Bayesian Score: -3.018    Bayesian Score: -2.978   Bayesian Score: -2.712   Bayesian Score: -2.709




                                                                        Ekins et al., Mol BioSyst, 6: 840-851, 2010
Bayesian Classification Dose response

Good




Bad




                      Ekins et al., Mol BioSyst, 6: 840-851, 2010
Initial testing of Mtb Bayesian models using NIAID and
                       GVKbio data




                               Both models substantially better than
                               the random hit rate for identifying
                               known active compounds with
                               MIC 5 uM in the first 1000
                               compounds sorted by the Bayesian
                               model scores

                               The number of active compounds
                               was substantially larger in the NIAID
                               dataset (1871 out of
                               3748) versus the GVKbio dataset
                               (377 out of 2880),




                            Ekins et al., Mol BioSyst, 6: 840-851, 2010
Additional test sets

 1702 hits in >100K cpds          34 hits in 248 cpds              21 hits in 2108 cpds

 100K library                     Novartis Data                   FDA drugs




Suggests models can predict data from the same and independent labs
Enrichments 4-10 fold
Initial enrichment – enables screening few compounds to find actives
Ekins et al., Mol BioSyst, 6: 840-851, 2010   Ekins and Freundlich, Pharm Res, 28, 1859-1869, 2011.
Dual-Event models


Become more stringent in what we call an
ACTIVE


IC90 < 10 uM and a selectivity index (SI)
greater than ten. SI was calculated as SI =
CC50/IC90 where CC50 is the concentration
that resulted in 50% inhibition of Vero cells
(CC50).
Dual-Event models
High-throughput   Mtb screening                       Bayesian Machine Learning Mtb Model
  phenotypic        molecule
 Mtb screening      database

                          S                        Descriptors + Bioactivity (+Cytotoxicity)

                     N
                     H
                              N




                                    Molecule Database
                                 (e.g. GSK malaria actives)
                         virtually scored using Bayesian Models



                              Top scoring molecules assayed for                  New bioactivity data
                                    Mtb growth inhibition                        may enhance models




                                                                         S

                                      Identify in vitro hits         N
                                                                     H
                                                                             N




                                  Increased hit/lead discovery efficiency
Bayesian Classification TB Models

    Single pt ROC XV AUC           = 0.88
    Dose resp                      = 0.78
    Dose resp + cyto               = 0.86


     Dateset          External       Internal
   (number of          ROC             ROC
   molecules)          Score          Score       Concordance    Specificity    Sensitivity
      MLSMR
  All single point
      screen
  (N = 220463)        0.86 ± 0       0.86 ± 0     78.56 ± 1.86   78.59 ± 1.94   77.13 ± 2.26
    MLSMR
dose response set
   (N = 2273)        0.73 ± 0.01    0.75 ± 0.01   66.85 ± 4.06   67.21 ± 7.05   65.47 ± 7.96
NEW Dose resp and
 cytotoxicity (N =
       2273)         0.82 ± 0.02    0.84 ± 0.02   82.61 ± 4.68   83.91 ± 5.48   65.99 ± 7.47


                                                    Ekins et al., PLOSONE, in press 2013
MLSMR dual event model




Good




bad




                     Ekins et al., PLOSONE, in press 2013
A new dataset to model
Models with SRI kinase data

    Model 1 ROC XV AUC (N 23797) = 0.89
    Model 2     (N 1248)         = 0.72
    Model 3     (N 1248)         = 0.77

    Leave out 50% x 100

  Dateset                     Internal
(number of      External        ROC
molecules)     ROC Score       Score       Concordance    Specificity    Sensitivity


  Model 1
(N = 23797)      0.87 ± 0     0.88 ± 0     76.77 ± 2.14   76.49 ± 2.41   81.7 ± 2.96


  Model 2
(N = 1248)     0.65 ± 0.01   0.70 ± 0.01   61.58 ± 1.56   61.85 ± 8.45   61.30 ± 8.24

  Model 3                                                                  64.84 ±
 (N=1248)      0.74 ± 0.02   0.75 ± 0.02   68.67 ± 6.88   69.28 ± 9.84      12.11

                                              Ekins et al., PLOSONE, in press 2013
Testing to date has been retrospective

      Can we use our models to select compounds
      and influence design?

      Prospective prediction

      Do it enough times to show robustness
Testing prospectively


MLSMR dose response with cytotoxicity and the
TAACF kinase dose response with cytotoxicity
models were used to screen the

Asinex library (N = 25,008)

Maybridge library (N = 57,200)

Selleck Chemicals kinase library (N = 194)
Results - Asinex library

94 molecules selected with the MLSMR dose response and
cytotoxicity model

88 with the library based on kinase inhibitor scaffolds with
cytotoxicity model and were tested at a single
concentration.

8 (MLSMR) and 19 hits (kinase) with > 90% inhibition at
100 ug/ml (8.5% and 21.5% hit rates)

                 Results - Maybridge library

50 molecules had greater than or equal to 90% inhibition at
100 ug/ml (28.7% hit rate) - 8 with good SI
                                 Ekins et al., PLOSONE, in press 2013
Asinex and MLSMR actives PCA




Ekins et al., PLOSONE, in press 2013
Examples of
selective and active
 compounds with
  MIC <10 ug/ml
An example of the model ranking similar
              compounds

Maybridge   Structure   Inhibition %   Inhibition %    MIC      MIC LORA    CC50 Vero   MLSMR    Kinase

 number                 MABA at 100    LORA at 100    MABA      (µg g/ml)   (µg g/ml)   model    model

                          µg g/ml        µg g/ml      (µg/ml)                           score    score




JFD02381                    98.9            95         5.84      10.09       >100       25.27    12.79
                                                                                        (0.80)   (0.5)




JFD02382                    91.5           90.1       > 100      47.99       >100       18.32     9.78
                                                                                        (0.69)   (0.43)
Analysis of SelleckChem Kinase library N=194


                                        47 molecules greater
                                        than or equal to 90%
                                        inhibition of M.
                                        tuberculosis activity,
                                        at 100ug/ml

                                        hit rate of 24.2%.

                                        Note best model was
                                        another dual activity
                                        model (Ekins et al.,
                                        Chem Biol 20: 370-378,
                                        2013)
Ekins et al., PLOSONE, in press 2013
Kinase inhibitors active vs Mtb




            SI not ideal
            – several other weaker actives are approved drugs
A summary of the numbers involved – filtering for hits.

82,403 molecules screened through Bayesian models


550 molecules were tested in vitro


124 actives were identified


22.5 % hit rate

Identified several novel potent lead series with good cytotoxicity
& selectivity

Identified known human kinase inhibitors and FDA approved
drugs as new hits
Conclusions


Still difficult to identify molecules with bioactivity and no
cytotoxicity

Models perform differently on different data sets

Need to understand what factors are key

Hit rate much higher than HTS / screen a fraction of
molecules

Computational models should be used prior to HTS

Focus resources
Acknowledgments

   The project described was supported by Award Number R43 LM011152-01
    “Biocomputation across distributed private datasets to enhance drug
    discovery” from the National Library of Medicine (PI: S. Ekins)

   Accelrys

   The CDD TB has been developed thanks to funding from the Bill and
    Melinda Gates Foundation (Grant#49852 “Collaborative drug discovery for
    TB through a novel database of SAR data optimized to promote data
    archiving and sharing”)

   Allen Casey (IDRI)
You can find me @...                                               CDD Booth 205
PAPER ID: 13433
PAPER TITLE: “Dispensing processes profoundly impact biological assays and computational and
statistical analyses”
April 8th 8.35am Room 349

PAPER ID: 14750
PAPER TITLE: “Enhancing High Throughput Screening For Mycobacterium tuberculosis Drug Discovery
Using Bayesian Models”
April 9th 1.30pm Room 353
PAPER ID: 21524

PAPER TITLE: “Navigating between patents, papers, abstracts and databases using public sources and
tools”
April 9th 3.50pm Room 350
PAPER ID: 13358

PAPER TITLE: “TB Mobile: Appifying Data on Anti-tuberculosis Molecule Targets”
April 10th 8.30am Room 357

PAPER ID: 13382
PAPER TITLE: “Challenges and recommendations for obtaining chemical structures of industry-provided
repurposing candidates”
April 10th 10.20am Room 350

PAPER ID: 13438
PAPER TITLE: “Dual-event machine learning models to accelerate drug discovery”
April 10th 3.05 pm Room 350

Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

  • 1.
    Enhancing High ThroughputScreening For Mycobacterium tuberculosis Drug Discovery Using Bayesian Models Sean Ekins1, 2*, Robert C. Reynolds3,4, Baojie Wan5 Scott G. Franzblau5, Joel S. Freundlich6,7and Barry A. Bunin1 1 Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA. 2 Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA. 3 Southern Research Institute, 2000 Ninth Avenue South, Birmingham, AL 35205, USA. 4 Current address: University of Alabama at Birmingham, College of Arts and Sciences , Department of Chemistry, 1530 3 rd Avenue South, Birmingham, Alabama 35294-1240, USA. 5 Institute for Tuberculosis Research, University of Illinois at Chicago, Chicago, IL 60607, USA 6 Department of Medicine, Center for Emerging and Reemerging Pathogens, UMDNJ – New Jersey Medical School, 185 South Orange Avenue Newark, NJ 07103, USA. 7 Department of Pharmacology & Physiology, UMDNJ – New Jersey Medical School, 185 South Orange Avenue Newark, NJ 07103, USA. .
  • 2.
    Applying CDD toBuild a disease community for TB  Tuberculosis kills 1.6-1.7m/yr (~1 every 8 seconds)  1/3rd of worlds population infected!!!!  Multi drug resistance in 4.3% of cases  Extensively drug resistant increasing incidence  One new drugs in over 40 yrs  Drug-drug interactions and Co-morbidity with HIV  Collaboration between groups is rare  These groups may work on existing or new targets  Use of computational methods with TB is rare
  • 3.
    ~ 20 publicdatasets for TB Including Novartis data on TB hits >300,000 cpds Patents, Papers Annotated by CDD Open to browse by anyone http://www.collaborativedrug. com/register
  • 4.
    Fitting into thedrug discovery process Ekins et al, Trends in Microbiology 19: 65-74, 2011
  • 5.
    HTS Hit rates SRI papers Usually less than 1%
  • 6.
    UIC hit rates Inhibitor Compound Number of Hit rate (%) at 90% Provider concentration (ug/ml Readout Library compounds Inhibition or uM) Luminescence ChemBridge Novacore 50,000 30 uM 4.55 (LuxAB) Luminescence Asinex Diverse 59,760 50 uM 1.91 (LuxAB) Luminescence ASDI 6,811 30 uM 2.73 (LuxAB) Prestwick 1,120 20 ug/ml Luminescence (ATP) 20.6 Fluorescence (MABA) 16.07 Luminescence MRCT 100,000 10 uM 0.67 (LuxABCDE)
  • 7.
    Wasting data? Information fromthese inefficient and expensive HTS campaigns does not appear to have been used to direct “informed” selection of new libraries in subsequent screens and compound optimization in TB drug discovery How can we continuously learn from all the data?
  • 8.
    Bayesian machine learning Bayesianclassification is a simple probabilistic classification model. It is based on Bayes’ theorem h is the hypothesis or model d is the observed data p(h) is the prior belief (probability of hypothesis h before observing any data) p(d) is the data evidence (marginal probability of the data) p(d|h) is the likelihood (probability of data d if hypothesis h is true) p(h|d) is the posterior probability (probability of hypothesis h being true given the observed data d) A weight is calculated for each feature using a Laplacian-adjusted probability estimate to account for the different sampling frequencies of different features. The weights are summed to provide a probability estimate Ekins, Williams and Xu, Drug Metab Dispos 38: 2302-2308, 2010
  • 9.
    Process – Bioactivityonly High-throughput Mtb screening Bayesian Machine Learning Mtb Model phenotypic molecule Mtb screening database S Descriptors + Bioactivity N H N Molecule Database (e.g. GSK malaria actives) virtually scored using Bayesian Models Top scoring molecules assayed for New bioactivity data Mtb growth inhibition may enhance models S Identify in vitro hits N H N Increased hit/lead discovery efficiency
  • 10.
    Bayesian Classification TBModels We can use the public data for machine learning model building Using Discovery Studio Bayesian model Leave out 50% x 100 Dateset Internal (number of External ROC molecules) ROC Score Score Concordance Specificity Sensitivity MLSMR All single point screen (N = 220463) 0.86 ± 0 0.86 ± 0 78.56 ± 1.86 78.59 ± 1.94 77.13 ± 2.26 MLSMR dose response set (N = 2273) 0.73 ± 0.01 0.75 ± 0.01 66.85 ± 4.06 67.21 ± 7.05 65.47 ± 7.96 Ekins et al., Mol BioSyst, 6: 840-851, 2010
  • 11.
    Bayesian Classification Modelsfor TB Laplacian-corrected Bayesian classifier models were generated using FCFP-6 and simple descriptors. 2 models 220,000 and >2000 compounds active compounds with MIC < 5uM Good G1: 1704324327 G2: -2092491099 G3: -1230843627 G4: 940811929 G5: 563485513 73 out of 165 good 57 out of 120 good 75 out of 188 good 35 out of 65 good 123 out of 357 good Bayesian Score: 2.885 Bayesian Score: 2.873 Bayesian Score: 2.811 Bayesian Score: 2.780 Bayesian Score: 2.769 Bad B1: 1444982751 B2: 274564616 B3: -1775057221 B4: 48625803 B5: 899570811 0 out of 1158 good 0 out of 1024 good 0 out of 982 good 0 out of 740 good 0 out of 738 good Bayesian Score: -3.135 Bayesian Score: -3.018 Bayesian Score: -2.978 Bayesian Score: -2.712 Bayesian Score: -2.709 Ekins et al., Mol BioSyst, 6: 840-851, 2010
  • 12.
    Bayesian Classification Doseresponse Good Bad Ekins et al., Mol BioSyst, 6: 840-851, 2010
  • 13.
    Initial testing ofMtb Bayesian models using NIAID and GVKbio data Both models substantially better than the random hit rate for identifying known active compounds with MIC 5 uM in the first 1000 compounds sorted by the Bayesian model scores The number of active compounds was substantially larger in the NIAID dataset (1871 out of 3748) versus the GVKbio dataset (377 out of 2880), Ekins et al., Mol BioSyst, 6: 840-851, 2010
  • 14.
    Additional test sets 1702 hits in >100K cpds 34 hits in 248 cpds 21 hits in 2108 cpds 100K library Novartis Data FDA drugs Suggests models can predict data from the same and independent labs Enrichments 4-10 fold Initial enrichment – enables screening few compounds to find actives Ekins et al., Mol BioSyst, 6: 840-851, 2010 Ekins and Freundlich, Pharm Res, 28, 1859-1869, 2011.
  • 15.
    Dual-Event models Become morestringent in what we call an ACTIVE IC90 < 10 uM and a selectivity index (SI) greater than ten. SI was calculated as SI = CC50/IC90 where CC50 is the concentration that resulted in 50% inhibition of Vero cells (CC50).
  • 16.
    Dual-Event models High-throughput Mtb screening Bayesian Machine Learning Mtb Model phenotypic molecule Mtb screening database S Descriptors + Bioactivity (+Cytotoxicity) N H N Molecule Database (e.g. GSK malaria actives) virtually scored using Bayesian Models Top scoring molecules assayed for New bioactivity data Mtb growth inhibition may enhance models S Identify in vitro hits N H N Increased hit/lead discovery efficiency
  • 17.
    Bayesian Classification TBModels Single pt ROC XV AUC = 0.88 Dose resp = 0.78 Dose resp + cyto = 0.86 Dateset External Internal (number of ROC ROC molecules) Score Score Concordance Specificity Sensitivity MLSMR All single point screen (N = 220463) 0.86 ± 0 0.86 ± 0 78.56 ± 1.86 78.59 ± 1.94 77.13 ± 2.26 MLSMR dose response set (N = 2273) 0.73 ± 0.01 0.75 ± 0.01 66.85 ± 4.06 67.21 ± 7.05 65.47 ± 7.96 NEW Dose resp and cytotoxicity (N = 2273) 0.82 ± 0.02 0.84 ± 0.02 82.61 ± 4.68 83.91 ± 5.48 65.99 ± 7.47 Ekins et al., PLOSONE, in press 2013
  • 18.
    MLSMR dual eventmodel Good bad Ekins et al., PLOSONE, in press 2013
  • 19.
    A new datasetto model
  • 20.
    Models with SRIkinase data Model 1 ROC XV AUC (N 23797) = 0.89 Model 2 (N 1248) = 0.72 Model 3 (N 1248) = 0.77 Leave out 50% x 100 Dateset Internal (number of External ROC molecules) ROC Score Score Concordance Specificity Sensitivity Model 1 (N = 23797) 0.87 ± 0 0.88 ± 0 76.77 ± 2.14 76.49 ± 2.41 81.7 ± 2.96 Model 2 (N = 1248) 0.65 ± 0.01 0.70 ± 0.01 61.58 ± 1.56 61.85 ± 8.45 61.30 ± 8.24 Model 3 64.84 ± (N=1248) 0.74 ± 0.02 0.75 ± 0.02 68.67 ± 6.88 69.28 ± 9.84 12.11 Ekins et al., PLOSONE, in press 2013
  • 21.
    Testing to datehas been retrospective Can we use our models to select compounds and influence design? Prospective prediction Do it enough times to show robustness
  • 22.
    Testing prospectively MLSMR doseresponse with cytotoxicity and the TAACF kinase dose response with cytotoxicity models were used to screen the Asinex library (N = 25,008) Maybridge library (N = 57,200) Selleck Chemicals kinase library (N = 194)
  • 23.
    Results - Asinexlibrary 94 molecules selected with the MLSMR dose response and cytotoxicity model 88 with the library based on kinase inhibitor scaffolds with cytotoxicity model and were tested at a single concentration. 8 (MLSMR) and 19 hits (kinase) with > 90% inhibition at 100 ug/ml (8.5% and 21.5% hit rates) Results - Maybridge library 50 molecules had greater than or equal to 90% inhibition at 100 ug/ml (28.7% hit rate) - 8 with good SI Ekins et al., PLOSONE, in press 2013
  • 24.
    Asinex and MLSMRactives PCA Ekins et al., PLOSONE, in press 2013
  • 25.
    Examples of selective andactive compounds with MIC <10 ug/ml
  • 26.
    An example ofthe model ranking similar compounds Maybridge Structure Inhibition % Inhibition % MIC MIC LORA CC50 Vero MLSMR Kinase number MABA at 100 LORA at 100 MABA (µg g/ml) (µg g/ml) model model µg g/ml µg g/ml (µg/ml) score score JFD02381 98.9 95 5.84 10.09 >100 25.27 12.79 (0.80) (0.5) JFD02382 91.5 90.1 > 100 47.99 >100 18.32 9.78 (0.69) (0.43)
  • 27.
    Analysis of SelleckChemKinase library N=194 47 molecules greater than or equal to 90% inhibition of M. tuberculosis activity, at 100ug/ml hit rate of 24.2%. Note best model was another dual activity model (Ekins et al., Chem Biol 20: 370-378, 2013) Ekins et al., PLOSONE, in press 2013
  • 28.
    Kinase inhibitors activevs Mtb SI not ideal – several other weaker actives are approved drugs
  • 29.
    A summary ofthe numbers involved – filtering for hits. 82,403 molecules screened through Bayesian models 550 molecules were tested in vitro 124 actives were identified 22.5 % hit rate Identified several novel potent lead series with good cytotoxicity & selectivity Identified known human kinase inhibitors and FDA approved drugs as new hits
  • 30.
    Conclusions Still difficult toidentify molecules with bioactivity and no cytotoxicity Models perform differently on different data sets Need to understand what factors are key Hit rate much higher than HTS / screen a fraction of molecules Computational models should be used prior to HTS Focus resources
  • 31.
    Acknowledgments  The project described was supported by Award Number R43 LM011152-01 “Biocomputation across distributed private datasets to enhance drug discovery” from the National Library of Medicine (PI: S. Ekins)  Accelrys  The CDD TB has been developed thanks to funding from the Bill and Melinda Gates Foundation (Grant#49852 “Collaborative drug discovery for TB through a novel database of SAR data optimized to promote data archiving and sharing”)  Allen Casey (IDRI)
  • 32.
    You can findme @... CDD Booth 205 PAPER ID: 13433 PAPER TITLE: “Dispensing processes profoundly impact biological assays and computational and statistical analyses” April 8th 8.35am Room 349 PAPER ID: 14750 PAPER TITLE: “Enhancing High Throughput Screening For Mycobacterium tuberculosis Drug Discovery Using Bayesian Models” April 9th 1.30pm Room 353 PAPER ID: 21524 PAPER TITLE: “Navigating between patents, papers, abstracts and databases using public sources and tools” April 9th 3.50pm Room 350 PAPER ID: 13358 PAPER TITLE: “TB Mobile: Appifying Data on Anti-tuberculosis Molecule Targets” April 10th 8.30am Room 357 PAPER ID: 13382 PAPER TITLE: “Challenges and recommendations for obtaining chemical structures of industry-provided repurposing candidates” April 10th 10.20am Room 350 PAPER ID: 13438 PAPER TITLE: “Dual-event machine learning models to accelerate drug discovery” April 10th 3.05 pm Room 350

Editor's Notes

  • #11 CDD Experienced Team Innovates and Executes Barry Bunin, PhD (Pres. &amp; Cofounder as first Eli Lilly EIR) Libraria (CEO, Pres.-CSO), Arris Pharmaceuticals (Sr. Scientist), Genentech, UC Berkeley (Ellman), Columbia University, author. Moses Hohman, PhD (Director Software Engineering) Northwestern Assoc. Director of Bioinformatics, Thoughtworks, Inc., U of Chicago (PhD), Harvard ( magna cum laude, Physics) Sylvia Ernst, PhD (Director Community Growth &amp; Sales) Left 800-lb Gorillas: Accelrys-Scitegic, MDL-Elsevier-Beilstein Peter Cohan (BOD &amp; Overall Sales Strategy) Symyx (VP Bus Dev &amp; President-Discovery Tools), MDL (VP Customer Marketing), www.secondderivative.com, author. Omidyar Network, Founders Fund, &amp; Lilly (BOD observers) WSGR (Corporate Counsel), Rina Accountancy (GAAP compliance) Partners: Hub Consortium Members, ChemAxon, DNDi, MMV, Sandler Center… CDD SAB: Christopher Lipinski PhD, James McKerrow, MD PhD, David Roos PhD, Adam Renslo PhD, Wes Van Voorhis, MD PhD
  • #12 CDD Experienced Team Innovates and Executes Barry Bunin, PhD (Pres. &amp; Cofounder as first Eli Lilly EIR) Libraria (CEO, Pres.-CSO), Arris Pharmaceuticals (Sr. Scientist), Genentech, UC Berkeley (Ellman), Columbia University, author. Moses Hohman, PhD (Director Software Engineering) Northwestern Assoc. Director of Bioinformatics, Thoughtworks, Inc., U of Chicago (PhD), Harvard ( magna cum laude, Physics) Sylvia Ernst, PhD (Director Community Growth &amp; Sales) Left 800-lb Gorillas: Accelrys-Scitegic, MDL-Elsevier-Beilstein Peter Cohan (BOD &amp; Overall Sales Strategy) Symyx (VP Bus Dev &amp; President-Discovery Tools), MDL (VP Customer Marketing), www.secondderivative.com, author. Omidyar Network, Founders Fund, &amp; Lilly (BOD observers) WSGR (Corporate Counsel), Rina Accountancy (GAAP compliance) Partners: Hub Consortium Members, ChemAxon, DNDi, MMV, Sandler Center… CDD SAB: Christopher Lipinski PhD, James McKerrow, MD PhD, David Roos PhD, Adam Renslo PhD, Wes Van Voorhis, MD PhD
  • #13 CDD Experienced Team Innovates and Executes Barry Bunin, PhD (Pres. &amp; Cofounder as first Eli Lilly EIR) Libraria (CEO, Pres.-CSO), Arris Pharmaceuticals (Sr. Scientist), Genentech, UC Berkeley (Ellman), Columbia University, author. Moses Hohman, PhD (Director Software Engineering) Northwestern Assoc. Director of Bioinformatics, Thoughtworks, Inc., U of Chicago (PhD), Harvard ( magna cum laude, Physics) Sylvia Ernst, PhD (Director Community Growth &amp; Sales) Left 800-lb Gorillas: Accelrys-Scitegic, MDL-Elsevier-Beilstein Peter Cohan (BOD &amp; Overall Sales Strategy) Symyx (VP Bus Dev &amp; President-Discovery Tools), MDL (VP Customer Marketing), www.secondderivative.com, author. Omidyar Network, Founders Fund, &amp; Lilly (BOD observers) WSGR (Corporate Counsel), Rina Accountancy (GAAP compliance) Partners: Hub Consortium Members, ChemAxon, DNDi, MMV, Sandler Center… CDD SAB: Christopher Lipinski PhD, James McKerrow, MD PhD, David Roos PhD, Adam Renslo PhD, Wes Van Voorhis, MD PhD
  • #18 CDD Experienced Team Innovates and Executes Barry Bunin, PhD (Pres. &amp; Cofounder as first Eli Lilly EIR) Libraria (CEO, Pres.-CSO), Arris Pharmaceuticals (Sr. Scientist), Genentech, UC Berkeley (Ellman), Columbia University, author. Moses Hohman, PhD (Director Software Engineering) Northwestern Assoc. Director of Bioinformatics, Thoughtworks, Inc., U of Chicago (PhD), Harvard ( magna cum laude, Physics) Sylvia Ernst, PhD (Director Community Growth &amp; Sales) Left 800-lb Gorillas: Accelrys-Scitegic, MDL-Elsevier-Beilstein Peter Cohan (BOD &amp; Overall Sales Strategy) Symyx (VP Bus Dev &amp; President-Discovery Tools), MDL (VP Customer Marketing), www.secondderivative.com, author. Omidyar Network, Founders Fund, &amp; Lilly (BOD observers) WSGR (Corporate Counsel), Rina Accountancy (GAAP compliance) Partners: Hub Consortium Members, ChemAxon, DNDi, MMV, Sandler Center… CDD SAB: Christopher Lipinski PhD, James McKerrow, MD PhD, David Roos PhD, Adam Renslo PhD, Wes Van Voorhis, MD PhD