Donovan N. Chin & R. Aldrin Denny
   Traditional Drug Discovery (insert graph)
   In Silico Prediction of ADME (insert graph)
    ◦   Potency
    ◦   Absorption
    ◦   Lead
    ◦   Drug
    ◦   Toxicity
    ◦   Excretion
    ◦   Metabolism
    ◦   distribution
   Target IVY(Brute force virtual screening of
    very large compound libraries) Lead
    Discovery IVY(Utilize predictive models
    from Biogen data for more efficient virtual
    screening) Lead Optimization candidate
   (insert graph)
    ◦   Potency
    ◦   Lead
    ◦   Drug
    ◦   Toxicity
    ◦   Excretion
    ◦   Metabolism
    ◦   Distribution
    ◦   absorption
   Goal: Identify crystallographic binding mode,
    Rank order ligands wrt binding with protein

   (insert graph)

   Receptor Docking

   Ligand Shape

   Generate plausible trial binding modes using
    docking function then Re-rank modes with
    scoring function
   (insert graph)
   341 Active
   47 Non-Active
   (insert graph)

   After filtering by Pharmacophore Feature
   (insert graph)
   (insert functions for)
    ◦   F_Score*
    ◦   D_Score
    ◦   G_Score
    ◦   PMF_Score
    ◦   Chem_Score
    ◦   ICM_Score*
   Cell Adhesion Assay (50% Serum)
    ◦ (insert graph)

   Biochemical Adhesion Assay
    ◦ (insert graph)

   Scoring Functions Are Poor More Often Than
    Not
   Receptor Site View Library Design FlexX
    Score Consensus Score>=3 e.g. Contact
    Map, CLogP MW, HBOND Rotatable bonds
    Consensus=5? if yes, substructure exists?
    if yes, Pharmacophore<4.2Å? if yes, Publish
    Hit Report
   (insert graph)
   Goal: Predict hit/miss class based on presence of features
    (fingerprints)
   Method
    ◦ Given a set of N samples
    ◦ Given that some subset A of them are good („active‟)
       Then we estimate for a new compound: P(good)~ A/N
    ◦ Given a set of binary features F
       For a given feature F:
            It appears in N samples
            It appears in A good samples
       Can we estimate: P(good l F)~A/N
            (Problem: Error gets worse as Nsmall)
    ◦ P‟(good l F)= (A+P(good)k)/(n+k)
       P‟(good l F)p(good)as N0
       P‟(good l F) A/N as N large
    ◦ (If K=1/P(good) this is the Laplacian correction)
   Descriptors (insert)
   Advantages
    ◦ Can describe huge number of features (up to 4 billion; MDL 1024; Lead
      scope 27,000)
    ◦ Contains tertiary and stereochemistry information
    ◦ Fast
   Classification Analysis

    ◦ Developing Non-Linear Scoring Functions to classify
      actives and non-actives

    ◦ (insert graphs)

    ◦ Cost Function to Minimize: Gini Impurity N= 1-
      ΣP^2(ω)
   Training Set Prediction Success

   (insert table)

   10-fold cross validation

   Randomly split training and test sets

   Significant Improvement in Separating Actives
    from Non-Actives
   (insert graph)

   Significant Improvement in Finding Hits Using
    New SF
   Optimal tree identified (insert graph)

   No random effects (insert graph)
   (insert cluster)

   Able to identify different molecular property
    criteria that lead to hits
   (insert graph)
   (insert graph)

   Size= magnitude of OBA

   OBA values cover range of descriptor space
   (insert graph)

   Choose 1 & 2D Descriptors for ease of
    interpretation and lower “noise”
   Build Model (insert graphs) Apply Model
   Features found in high OBA

   Features found in low OBA

   Would be nice if CART did similar view
   Improved scoring functions for separating
    hits from non-hits in structure-based drug
    design developed with CART and Bayesian
    models

   Identified key differences in molecular
    physical properties that led to hits

   Built reasonably predictive OBA model
    (cannot expect method to extend to other
    systems given complexity of OBA, however)
   Biogen IDEC

   Modeling
    ◦   Rajiah Denny
    ◦   Claudio Chuaqui
    ◦   Juswinder Singh
    ◦   Herman van Vlijmen
    ◦   Norman Wang
    ◦   Anuj Patel
    ◦   Zhan Deng

   Chemistry
    ◦   Kevin Guckian
    ◦   Dan Scott
    ◦   Thomas Durand-Reville
    ◦   Pat Conlon
    ◦   Charlie Hammond
    ◦   Chuck Jewell

   Pharmacology
    ◦ Tonika Bonhert

Improved Predictions in Structure-Based Drug Design Using CART and Bayesian Models

  • 1.
    Donovan N. Chin& R. Aldrin Denny
  • 2.
    Traditional Drug Discovery (insert graph)  In Silico Prediction of ADME (insert graph) ◦ Potency ◦ Absorption ◦ Lead ◦ Drug ◦ Toxicity ◦ Excretion ◦ Metabolism ◦ distribution
  • 3.
    Target IVY(Brute force virtual screening of very large compound libraries) Lead Discovery IVY(Utilize predictive models from Biogen data for more efficient virtual screening) Lead Optimization candidate
  • 4.
    (insert graph) ◦ Potency ◦ Lead ◦ Drug ◦ Toxicity ◦ Excretion ◦ Metabolism ◦ Distribution ◦ absorption
  • 5.
    Goal: Identify crystallographic binding mode, Rank order ligands wrt binding with protein  (insert graph)  Receptor Docking  Ligand Shape  Generate plausible trial binding modes using docking function then Re-rank modes with scoring function
  • 6.
    (insert graph)  341 Active  47 Non-Active
  • 7.
    (insert graph)  After filtering by Pharmacophore Feature
  • 8.
    (insert graph)
  • 9.
    (insert functions for) ◦ F_Score* ◦ D_Score ◦ G_Score ◦ PMF_Score ◦ Chem_Score ◦ ICM_Score*
  • 10.
    Cell Adhesion Assay (50% Serum) ◦ (insert graph)  Biochemical Adhesion Assay ◦ (insert graph)  Scoring Functions Are Poor More Often Than Not
  • 11.
    Receptor Site View Library Design FlexX Score Consensus Score>=3 e.g. Contact Map, CLogP MW, HBOND Rotatable bonds Consensus=5? if yes, substructure exists? if yes, Pharmacophore<4.2Å? if yes, Publish Hit Report
  • 12.
    (insert graph)
  • 13.
    Goal: Predict hit/miss class based on presence of features (fingerprints)  Method ◦ Given a set of N samples ◦ Given that some subset A of them are good („active‟)  Then we estimate for a new compound: P(good)~ A/N ◦ Given a set of binary features F  For a given feature F:  It appears in N samples  It appears in A good samples  Can we estimate: P(good l F)~A/N  (Problem: Error gets worse as Nsmall) ◦ P‟(good l F)= (A+P(good)k)/(n+k)  P‟(good l F)p(good)as N0  P‟(good l F) A/N as N large ◦ (If K=1/P(good) this is the Laplacian correction)  Descriptors (insert)  Advantages ◦ Can describe huge number of features (up to 4 billion; MDL 1024; Lead scope 27,000) ◦ Contains tertiary and stereochemistry information ◦ Fast
  • 14.
    Classification Analysis ◦ Developing Non-Linear Scoring Functions to classify actives and non-actives ◦ (insert graphs) ◦ Cost Function to Minimize: Gini Impurity N= 1- ΣP^2(ω)
  • 15.
    Training Set Prediction Success  (insert table)  10-fold cross validation  Randomly split training and test sets  Significant Improvement in Separating Actives from Non-Actives
  • 16.
    (insert graph)  Significant Improvement in Finding Hits Using New SF
  • 17.
    Optimal tree identified (insert graph)  No random effects (insert graph)
  • 18.
    (insert cluster)  Able to identify different molecular property criteria that lead to hits
  • 19.
    (insert graph)
  • 20.
    (insert graph)  Size= magnitude of OBA  OBA values cover range of descriptor space
  • 21.
    (insert graph)  Choose 1 & 2D Descriptors for ease of interpretation and lower “noise”
  • 22.
    Build Model (insert graphs) Apply Model
  • 23.
    Features found in high OBA  Features found in low OBA  Would be nice if CART did similar view
  • 24.
    Improved scoring functions for separating hits from non-hits in structure-based drug design developed with CART and Bayesian models  Identified key differences in molecular physical properties that led to hits  Built reasonably predictive OBA model (cannot expect method to extend to other systems given complexity of OBA, however)
  • 25.
    Biogen IDEC  Modeling ◦ Rajiah Denny ◦ Claudio Chuaqui ◦ Juswinder Singh ◦ Herman van Vlijmen ◦ Norman Wang ◦ Anuj Patel ◦ Zhan Deng  Chemistry ◦ Kevin Guckian ◦ Dan Scott ◦ Thomas Durand-Reville ◦ Pat Conlon ◦ Charlie Hammond ◦ Chuck Jewell  Pharmacology ◦ Tonika Bonhert