Species composition, diversity and community structure of mangroves in Barang...
autoDock.ppt
1. AUTODOCK
An Automated Docking Software for
Predicting Optimal Protein-Ligand Interaction
By
Susan McClatchy, Milind Misra,
Chandreyee Mukherjee, Indu Shrivastava
3. Interaction between biomolecules lie at the core of all metabolic
processes and life activities
The number of solved protein structures available in the
databases is expanding exponentially
To understand their functions it is essential to elucidate the
interaction mechanisms between the different molecules
Primary importance lies in rational drug design
Depending upon the success of the docked molecules the
docking ligand may be redesigned or its structure further
refined.
Also important in the area of immunology to study antigen-
antibody interaction.
Automated Docking: Importance
4. Inhibitor bound to active site of HIVPR
Surface structure of HIVPR with bound
inhibitor
5. Prediction of the optimal physical configuration and
energy between two molecules
The docking problem optimizes:
Binding between two molecules such that their
orientation maximizes the interaction
Evaluates the total energy of interaction such that for
the best binding configuration the binding energy is
the minimum
The resultant structural changes brought about by
the interaction
What is docking?
6. 1. Protein-Protein Docking:
Both molecules are rigid
Interaction produces no change in conformation
Similar to lock-and key model
2. Protein-Ligand Docking:
Ligand is flexible but the receptor protein is rigid
Interaction produces conformational changes in
ligand
Categories of docking
8. It involves:
Finding useful ways of representing the molecules and molecular
properties.
Exploration of the configuration spaces available for interaction
between ligand and receptor.
Evaluate and rank configurations using a scoring system, in this
case the binding energy
However, since it is difficult to evaluate the binding energy because the
binding sites may not be easily accessible, the binding energy is modeled
as follows:
∆G bind= ∆Gvdw + ∆Ghbond + ∆Gelect + ∆G conform+ ∆G tor + ∆G sol
Docking uses a “search and
score” method
9. Developed by AJ Olson’s group in 1990.
AutoDock uses free energy of the docking molecules
using 3D potential-grids
Uses heuristic search to minimize the energy.
Search Algorithms used:
Simulated Annealing
Genetic Algorithm
Lamarckian GA (GA+LS hybrid)
The AutoDock Software
10. Algorithms Overview
Simulated Annealing
Based on temperature effects
Start with high temperature and global search
Lower temperature local search
Genetic Algorithm
Charles Darwin’s Theory of Evolution
Genotype Phenotype
Lamarckian Algorithm ( Jean –Baptiste de Lamarck)
Phenotype Genotype
11. Study algorithms used to perform the
searches and to calculate minimum energy
Discuss why GA+LS hybrid better than SA
Look at an example, i.e., dock a ligand to a
protein molecule using latest AutoDock
version
Project Goal
13. Simulated Annealing
Algorithm modeled after the cooling of a solution to form glass,
though it’s better explained by crystal formation
Given a long enough cooling time, molecules will relax into
their lowest energy state to form the largest crystals
Quick cooling - highly disordered system
Slow cooling - highly ordered crystal, with each molecule in
its lowest energy state
Algorithm simulates either linear or proportional slow
cooling
14. The SA Algorithm
Uses neighborhood operator N(s) to generate a set of solutions
according to a fixed distribution
New solution compared to preceding solution, and is accepted
if its energy is lower than that of previous solution
If new solution has higher energy, it is accepted probabilistically
according to Boltzmann distribution (see figure above)
At high temperatures, many higher energy solutions will be
accepted; at low temps., majority of probabilistic moves rejected
Boltzmann probability distribution = e exp(delta E/T) where
delta E = energy difference between two solutions,
T = temperature
Boltzmann finds p(of finding a system with energy E at temp T)
15. Pseudocode for SA
Compute a random initial state s
n=0, x*n = s // initialize best solution to s and first state to 0
Repeat i = 1, 2, … // specify number of temperatures to try
Repeat j = 1, 2, …, mi // no. of steps to perform for each temp. Ti
Compute a neighbor s’ = N(s) // s’ = new solution from N(s)
if (f(s’) <= f(s)) then // if energy of s’ <= energy of s
s = s’ // accept new solution s’
if (f(s) < f(x*n)) then // if energy of new solution <
x*n = s // energy of best solution of
n = n + 1 // state n, replace best with new
endif
else // otherwise replace s with s’ using
s = s’ with probability e (f(s) - f(s’))/T
i // Boltzmann dist.
endif
EndRepeat
EndRepeat
16. How Genetic Algorithms Work
- A Simple Example
1 1 1 1 0 0
0 0 0 0 0 1
1 0 0 0 0 1
0 0 0 0 0 0
Initial population of
binary creatures
having 6 “genes”
Each gene has two
different alleles, either
a 0 or a 1
Three operators:
crossover, mutation
and selection
17. Selection
1 1 1 1 0 0
0 0 0 0 0 1
1 0 0 0 0 1
0 0 0 0 0 0
Selection based on a
fitness function f(x)
This operator chooses
those individuals with
the lowest values
Those with higher
values chosen with a
very low probability
20
13
48
52
20. Replacement
Lower scoring individuals create
more offspring, higher scoring
ones create fewer or none at all
Offspring replace parental
generation
“Elitism” function allows best
individual from parent
generation to persist, if it is a
better solution than new
individuals created
Cycle of selection, mutation,
crossover and replacement
repeated
0 0 1 1 0 0
1 1 1 0 1 1
1 1 1 1 0 1
0 0 1 0 1 0
15 1
9 1
22 0
1 2
21. Pseudocode for GA
Select an initial population set xi
0 = {x1
0 , x2
0,…, xM
0}
Determine fitness values f(xi
0) for each individual
Repeat for g = 1, 2, … # of generations
Perform selection
Perform crossover with probability
Perform mutation with probability
Determine fitness f(xi
g) for new individuals
xg
* = argmini=1,…M f(xi
g) and yg* = f(xg
*)
Perform replacement
Until stopping criterion (# of generations) is reached
22. How GA works in AutoDock
Ligand’s “genes” are its x, y
and z coordinates
These form a unit vector,
which is given a random
rotation angle between 0
o
and 360
o
to form a
quaternion
Additional genes may
represent torsion angles
between bonds of the ligand
23. Mapping
In standard GA, the genotype
(x,y,z coordinates plus
rotation and any torsion
angles) are mapped to the
fitness function f(x)
The fitness function value
corresponds to each
individual’s phenotype
According to the right hand
side of the figure, genotypes
of parents with high f(x)
values are mutated to form
genotypes of children with
lower f(x) values
24. Selection, Crossover & Mutation
Selection chooses ligands
with the lowest fitness
(energy) values
Crossover exchanges x, y, z
coordinates, or rotations or
torsions between these
ligands
Example: Two ligands with
xyz coordinates Abc and aBc
Crossover results in new
individuals with coordinates
abc and ABc
Mutation operator mutates
coordinate or other angle
values by adding a random
real number according to a
Cauchy distribution, which
is similar to a Gaussian but
has thicker tails
25. Replacement
Individuals with better-
than-average fitness
receive proportionally
more offspring
no= (fw – fi)/(fw - <f>),
fw != <f>
where
no= number of offspring
fi = fitness of individual (energy
of ligand)
fw = fitness of worst individual
in last g generations
(typically 10)
<f> = mean fitness of
population
26. Lamarckian Genetic Algorithm
According to left hand side
of figure, LGA finds lowest
fitness function (energy)
values first, then maps these
values to their respective
genotypes
Genetic algorithm plus Solis
and Wets local search
Better performance than
either simulated annealing
or genetic algorithm alone
28. HIV-1 Protease and AHA006
HIV-1 Protease in complex with the cyclic
sulfamide inhibitor, AHA006
Source: Protein Data Bank
Authors: K. Backbro, T. Unge
Exp. Method: X-ray Diffraction (2 Å res.)
Primary Citation: Backbro et al, J Med Chem
40 pp. 898 (1997)
Polymer Chains: A, B; Residues: 198; Atoms:
1632
34. AutoDock uses grid-
based docking
Ligand-protein
interaction energies
are pre-calculated
and then used as a
look-up table during
simulation
Grid maps are
constructed based on
atoms of interest in
ligand (here CANOSH)
Docking Preparation – Grid
36. Docking – Simulated Annealing
Runs = 100
Cycles = 50
Initial Temp (RT) = 1,000
Temp reduction factor = .95
Linear temperature reduction
Translation reduction factor = 1
Quaternion reduction factor = 1
Torsional reduction factor = 1
# rotatable bonds = 12
Initial coordinates = Random
Initial quaternion = Random
Initial dihedrals = Random
Translation step = 2.0 Å
Quaternion step = 50 deg
Torsion step = 50 deg
Results:
100 different clusters
Energy range: -0.63 to +64,000
Conformation #81: -0.63
Conformation #67: +20.02
Conformation #68: +10.74
Lowest energy conf not close to
position but similar to original
Conf #67 closest to position and
conformation of original ligand;
higher energy
Conf #68 close to position but not
conformation of original ligand;
not as high energy
41. Docking – Genetic Algorithm
Runs = 50
# Evaluations = 250,000
Population size = 50
Elitism count = 1
Mutation rate = 0.02
Crossover rate = 0.8
Window size = 10
Cauchy alpha = 0
Cauchy beta = 1
# rotatable bonds = 12
Initial coordinates = Random
Initial quaternion = Random
Initial dihedrals = Random
Translation step = 2.0 Å
Quaternion step = 50 deg
Torsion step = 50 deg
Results:
50 different clusters
Energy range: -18.66 to +86.28
Conformation #39: -18.66
Conformation #9: -10.60
Lowest energy conformation
overall closest to original ligand
conformation
If only 10 runs had been used
instead of 50, then conf #9 would
have been the lowest energy
conformation.
42. Docking – Local Search
Results:
18 different clusters
Energy range: +35.92 to +215,200
Confs #20, 21, 22, 23: +35.92
Lowest energy conformation was
most dissimilar to original ligand
conformation
Better results could have been
obtained by reducing the step sizes
Runs = 50
Solis-Wets iterations = 300
Consecutive successes = 4
Consecutive failures = 4
Rho = 1
Lower bound on rho = 0.01
LS frequency = 0.06
# rotatable bonds = 12
Initial coordinates = Random
Initial quaternion = Random
Initial dihedrals = Random
Translation step = 2.0 Å
Quaternion step = 50 deg
Torsion step = 50 deg
43. Docking – Lamarckian GA
Results:
10 different clusters
Energy range: -18.10 to –8.38
Conformation #7: -18.10
Lowest energy conformation fairly
similar to original ligand
conformation
If the number of runs was
restricted to 10 for both GA and
LGA, LGA would have generated
the best structure
Runs = 10
Max # Evaluations = 250,000
Max # Generations = 27,000
Population size = 50
Elitism count = 1
Mutation rate = 0.02
Crossover rate = 0.8
Window size = 10
Cauchy alpha = 0
Cauchy beta = 1
Solis-Wets iterations = 300
Consecutive successes = 4
Consecutive failures = 4
Rho = 1
Lower bound on rho = 0.01
LS frequency = 0.06
* Gray options *