Improved Predictions in Structure-Based Drug Design Using CART and Bayesian Models

Donovan N. Chin & R. Aldrin Denny

 Traditional Drug Discovery (insert graph)
 In Silico Prediction of ADME (insert graph)
◦ Potency
◦ Absorption
◦ Lead
◦ Drug
◦ Toxicity
◦ Excretion
◦ Metabolism
◦ distribution

 Target IVY(Brute force virtual screening of
very large compound libraries) Lead
Discovery IVY(Utilize predictive models
from Biogen data for more efficient virtual
screening) Lead Optimization candidate

 (insert graph)
◦ Potency
◦ Lead
◦ Drug
◦ Toxicity
◦ Excretion
◦ Metabolism
◦ Distribution
◦ absorption

 Goal: Identify crystallographic binding mode,
Rank order ligands wrt binding with protein

 (insert graph)

 Receptor Docking

 Ligand Shape

 Generate plausible trial binding modes using
docking function then Re-rank modes with
scoring function

 (insert graph)
 341 Active
 47 Non-Active

 (insert graph)

 After filtering by Pharmacophore Feature

 (insert functions for)
◦ F_Score*
◦ D_Score
◦ G_Score
◦ PMF_Score
◦ Chem_Score
◦ ICM_Score*

 Cell Adhesion Assay (50% Serum)
◦ (insert graph)

 Biochemical Adhesion Assay
◦ (insert graph)

 Scoring Functions Are Poor More Often Than
Not

 Receptor Site View Library Design FlexX
Score Consensus Score>=3 e.g. Contact
Map, CLogP MW, HBOND Rotatable bonds
Consensus=5? if yes, substructure exists?
if yes, Pharmacophore<4.2Å? if yes, Publish
Hit Report

 Goal: Predict hit/miss class based on presence of features
(fingerprints)
 Method
◦ Given a set of N samples
◦ Given that some subset A of them are good („active‟)
 Then we estimate for a new compound: P(good)~ A/N
◦ Given a set of binary features F
 For a given feature F:
 It appears in N samples
 It appears in A good samples
 Can we estimate: P(good l F)~A/N
 (Problem: Error gets worse as Nsmall)
◦ P‟(good l F)= (A+P(good)k)/(n+k)
 P‟(good l F)p(good)as N0
 P‟(good l F) A/N as N large
◦ (If K=1/P(good) this is the Laplacian correction)
 Descriptors (insert)
 Advantages
◦ Can describe huge number of features (up to 4 billion; MDL 1024; Lead
scope 27,000)
◦ Contains tertiary and stereochemistry information
◦ Fast

 Classification Analysis

◦ Developing Non-Linear Scoring Functions to classify
actives and non-actives

◦ (insert graphs)

◦ Cost Function to Minimize: Gini Impurity N= 1-
ΣP^2(ω)

 Training Set Prediction Success

 (insert table)

 10-fold cross validation

 Randomly split training and test sets

 Significant Improvement in Separating Actives
from Non-Actives

 (insert graph)

 Significant Improvement in Finding Hits Using
New SF

 Optimal tree identified (insert graph)

 No random effects (insert graph)

 (insert cluster)

 Able to identify different molecular property
criteria that lead to hits

 (insert graph)

 Size= magnitude of OBA

 OBA values cover range of descriptor space

 (insert graph)

 Choose 1 & 2D Descriptors for ease of
interpretation and lower “noise”

 Build Model (insert graphs) Apply Model

 Features found in high OBA

 Features found in low OBA

 Would be nice if CART did similar view

 Improved scoring functions for separating
hits from non-hits in structure-based drug
design developed with CART and Bayesian
models

 Identified key differences in molecular
physical properties that led to hits

 Built reasonably predictive OBA model
(cannot expect method to extend to other
systems given complexity of OBA, however)

 Biogen IDEC

 Modeling
◦ Rajiah Denny
◦ Claudio Chuaqui
◦ Juswinder Singh
◦ Herman van Vlijmen
◦ Norman Wang
◦ Anuj Patel
◦ Zhan Deng

 Chemistry
◦ Kevin Guckian
◦ Dan Scott
◦ Thomas Durand-Reville
◦ Pat Conlon
◦ Charlie Hammond
◦ Chuck Jewell

 Pharmacology
◦ Tonika Bonhert

Improved Predictions in Structure-Based Drug Design Using CART and Bayesian Models

More Related Content

Similar to Improved Predictions in Structure-Based Drug Design Using CART and Bayesian Models

More from Salford Systems

Recently uploaded

Improved Predictions in Structure-Based Drug Design Using CART and Bayesian Models