Structure based computer aided drug design

Prof. Thanh N. Truong
University of Utah
Astonis LLC
Institute of Computational Science and Technology

The Drug Discovery Process
Drug Target
Identification

Target
Validation

Lead
Identification

Lead
Optimization

Pre-clinical &
Clinical
Development

It takes about 15 years and costs around 880 millions
USD, ~10,000 compounds (DiMasi et al. 2003;
Dickson & Gagnon 2004) to develop a new drug.

FDA Review

Genomics Facts
Around 99% of our genes have counterparts in mice
Our genetic overlap with chimpanzees is about 97.5%
The genetic difference between one person and another is less than 0.1 %
But because only a few regions of DNA actively encode life functions, the
real difference between one person and another is only 0.0003 %

It is becoming increasingly evident that the complexity of
biological systems lies at the level of the proteins, and that
genomics alone will not suffice to understand these systems.

Structure-based Computer-Aided Drug Design
Drug Target
Identification

Target
Validation

Shorten development
time to Lead
Identification

Lead
Identification

Lead
Optimization

Known 3D structure
PDB databank

Pre-clinical &
Clinical
Development

FDA Review

Unknown 3D structure for the target protein

Homology modeling or
Protein Structure Prediction

Reduce cost

MD simulations

Past Successes
1.

2.

3.

HIV protease inhibitor
amprenavir (Agenerase)
from Vertex & GSK (Kim et
al. 1995)
HIV: nelfinavir (Viracept) by
Pfizer (& Agouron) (Greer et
al. 1994)
Influenza neuraminidase
inhibitor zanamivir
(Relenza) by GSK
(Schindler 2000)

Target Model
Docking
Simulation
Cluster
Analysis
Scoring

Analysis

Trajectories

Homology Modeling
Prediction of protein tertiary structure from the known sequence

Docking
The Problem:
Determine the optimal binding structure of a ligand (a drug
candidate, a small molecule) to a receptor (a drug target, a protein or
DNA) and quantify the strength of the ligand-receptor interaction.

1.
2.
3.
4.
5.

Where the ligand will bind?
How will it bind?
How strong?
Why?
What make a ligand binds
to the receptor better than
the others?
6. ????

The Challenge

Ligand and receptor are
conformational flexible.
Receptor may have more than
one possible binding site.
Weak short-range Interactions:
hydrogen bonds, salt bridges,
hydrophobic contacts,
electrostatics, van der Walls
repulsions
Surface
complementary.
Binding affinity is the difference to
the uncomplexed state – solvation
and desolvation play important
role.
Binding affinity describes an
ensemble of complexes not a
single one.

Orientation
of Ligand

Bound waters

Flexibility of residues in
the binding site

Large protein
conformation change

Binding Affinity
Association equilibrium constant

[ RL ]s
Ka =
[ R ]s [ L ]s

b
∆Gg

+
∆Gsolv ( R )

∆Gsolv ( L )

Free energy of binding:
b
∆Gs = ∆H − T ∆S = − RT ln K a
Enthalpy Entropy

+

From the thermodynamic cycle:
b
b
∆Gs = ∆Gg + ∆Gsolv ( RL ) − {∆Gsolv ( R ) + ∆Gsolv ( L )}

b
∆Gs

∆Gsolv ( RL )

Docking Process
Descriptions of the
receptor 3D structure,
binding site and ligand

Sampling of the
configuration space of
the binding complex
Multiple binding
configurations for a
single protein
structure and a ligand

Evaluating free energy
of binding for scoring

Local/global minimum

Ensemble of protein
structures and/or
multiple ligands

Description of Receptor 3D Structure
Known 3D protein structures from Protein Data Bank (PDB)
(http://www.rcsb.org/pdb)

Locations of hydrogen atoms, bound water molecules, and metal ions are either not
known or highly uncertain.
Identities and locations of some heavy atoms (e.g., ~1/6 of N/O of Asn & Gln, and N/C
of His incorrectly assigned in PDB; up to 0.5 Å uncertainty in position)
Conformational flexibility of proteins is not known

Homology models from highly similar sequences with known structures
Critical analysis of the receptor structure before docking is needed:
resolution, missing residues, bound waters and ions, protonation states, etc.

Descriptions of Binding Site
Known binding site – PDB
database has about 6000 proteinligand complexes
Atomistic based
o Receptor atomic coordinates and
location of a binding box

Descriptor based
o
o
o
o

Surface
Volume
Points & distances, bond vectors
grid and various properties such
as electrostatic potential,
hydrophobic moment, polar,
nonpolar, atom types, etc

Unknown binding site
Blind docking with the binding box
cover the entire receptor –
computationally expensive
Better method for finding potential
binding sites is needed

Ligand Chemical Space

National Cancer Institute (NCI) public database
(http://129.43.27.140/ncidb2/)
About 250 K 3D structures

ZINC public database (http://zinc.docking.org/)
About 8 million 3D structures

PubChem public database (http://pubchem.ncbi.nlm.nih.gov/)
About 19 million entries (but no 3D structures)

Cambridge Structure database (CSD)
About 3 million crystal structures

Chemical Abstract Service (CAS) and SciFinder
Several other smaller databases …

Atomic partial charges
from MM force fields
or MO calculations
must be added to
each molecule for
evaluation of the
score function

Different Approaches in Docking
Complete conformation and configuration space are too large. Different approaches were
developed for effective sampling of the receptor-ligand configuration space.

Automated

Manual

Descriptor Matching

Simulation-based

• Use pattern-recognizing
geometric methods to
match ligand and receptor
site descriptors
• Ligand flexibility is limited
• Receptor is rigid
• Accuracy is not very good
– not discriminative
• Fast

• Use simulation methods to
sample the local configuration
space: MC-Simulated
Annealing, Genetic Algorithm.
Must run an ensemble of
starting orientations for
accurate statistics
• Ligand and protein flexibility
can be considered
• Free energy of binding is
evaluated
• Accuracy is good
• Time consuming
• Grid map is often used to
speed up energy evaluations

User interactive force
feedbacks through
haptic devices

Focus

MC-Simulated Annealing Method
Randomly change the receptor flexible residues,
ligand position, orientation, and/or conformation
Evaluate the new energy, Enew

YES

Enew < Eold ?

NO

Accept the new move
with P = exp{-∆E/kbT}

Accept the new move
Enew
Eold
Reduce the temperature

NO

Naccept or reject > Nlimit
YES

Done

Genetic Algorithm
Darwin Theory of Evolution

Living organisms
Made up of cells
Has the same set of chromosomes (DNA)

Genome: A set of all chromosomes
Chromosome consists of genes

Genotype: A particular set of genes
Each gene encodes a protein (a trait)
Each gene has a location in the chromosome (locus)

Reproduction by cross-over and mutation

Genetic Algorithm for Docking
Gene 1

Gene 2

Gene 3

x1 y1 z1 φ1 ψ1 ω1 τ1 τ2 τ3 τ4
Position

Orientation

Chromosome 1

Torsional angles

x2 y2 z2 φ2 ψ2 ω2 τ1’ τ2’ τ3’ τ4’ Chromosome 2
A chromosome is a possible solution: binding position,
orientation, and values of all rotatable torsional angles
Fitness Test
Translates genotypes to phenotypes (receptor-ligand complex
structures) for binding free energy evaluation.

A cell is a set of
possible solutions,
i.e. chromosomes.
Typical population
= 100-200

Select best parents
Those with large negative ∆G binding

Generate new generation
Migration: Move the best genes to the next generation
Cross-over: Exchange a set of genes from one parent chromosome to another. Typical cross-over
rate = 80-90%
Mutation: Randomly change a value of a gene, i.e. position, orientation, or torsional values. Typical
mutation rate = 0.5-1%

Two-point Cross-over Operator

x1 y1 z1 φ1 ψ1 ω1 τ1 τ2 τ3 τ4

Parent 1

Swap positions

x2 y2 z2 φ2 ψ2 ω2 τ1’ τ2’ τ3’ τ4’ Parent 2
x2 y2 z2 φ1 ψ1 ω1 τ1 τ2 τ3 τ4

Child 1

x1 y1 z1 φ2 ψ2 ω2 τ1’ τ2’ τ3’ τ4’ Child 2

Lamarkian Genetic Algorithm -- AutoDock
Environmental adaptation of an individual’s phenotypic characteristics
acquired during lifetime can become heritable traits
Survival of the fittest.
1. Mutation and cross-breeding to
generate new genotype
generate
new possible ligand binding
configuration
2. Transfer to phenotype to evaluate
fitness
forming receptor-ligand
configuration.

Environmental
adaption

3. Adapt to the local environment to
improve fitness local minimization.
4. Transfer back to genotype for future
generations save the optimized
ligand binding configuration for future
generations.

Genetic Algorithm – Local Search

Morris et al., J. Comp. Chem. 1998, 19, 1639

Transfer to genotype
for future generations,
i.e. heritable traits.

GA

LS

Scoring Functions
Force Field based function

Focus
GOAL: Fast & Accurate
Experimentally observed complex

•Score = -∆Gbinding
•Has physical basis
•Fast with pre-computed grid
Multivariate regression fit physically
motivated structural functions to
experimentally known complexes
with measured binding affinity

-Score

Empirical function

Knowledge-based function
Statistical pair potential derived
from known complex structures

Descriptor based function
Based on chemical properties,
pharmacaphore, contact, shape
complementary

Complex configurations

Force Field Based Scoring Function
b
Score = −∆Gs
b
∆Gs = Cvdw ∗ ∆Gvdw + Cele ∗ ∆Gele + Chb ∗ ∆Ghb + Ctor ∗ ∆Gtor + Csolv ∗ ∆Gsolv

Coefficients are empirically determined using linear regression analysis from a set of
protein-ligand complexes from LPDB with known experimental binding constants.

Analyses
Energy histogram

Clustering analysis

Distribution of binding energies
average binding energy

Distribution of binding modes
different binding sites and ligand
binding orientations

Docking with Science Community Laboratory
Identify a target

Millions of molecules
from ZINC database
Docking simulation
with AutoDock-Vina

Rank according to
binding energy

Structure based computer aided drug design

More Related Content

What's hot

Viewers also liked

Similar to Structure based computer aided drug design

Recently uploaded

Structure based computer aided drug design