Molecular Docking
G. Schaftenaar
Docking Challenge
• Identification of the ligand’s correct
binding geometry in the binding site
(Binding Mode)
• Observation:
– Similar ligands can bind at quite
different orientations in the active
site.
Two main tasks of Docking Tools
• Sampling of conformational (Ligand)
space
• Scoring protein-ligand complexes
• Historically the first approaches.
• Protein and ligand fixed.
• Search for the relative orientation
of the two molecules with lowest
energy.
• FLOG (Flexible Ligands Oriented on
Grid): each ligand represented by up
to 25 low energy conformations.
Rigid-body docking algorithms
Introducing flexibility:
Whole molecule docking
• Monte Carlo methods (MC)
• Molecular Dynamics (MD)
• Simulated Annealing (SA)
• Genetic Algorithms (GA)
Available in packages:
AutoDock (MC,GA,SA)
GOLD (GA)
Sybyl (MD)
Monte Carlo
• Start with configuration A (energy EA)
• Make random move to configuration B
(energy EB)
• Accept move when:
EB < EA or if
EB > EA except with probability P:
 
 
kT
E
E
P B
A 

 exp
Molecular Dynamics
• force-field is used to calculate forces on
each atom of the simulated system
• following Newton mechanics, calculate
accelerations and velocities from the forces.
(Force = mass times acceleration)
• The atoms are moved slightly with respect
to a given time step
Simulated Annealing
Finding a global minimium
by lowering the temperature
during the Monte Carlo/MD simulation
Genetic Algorithms
• Ligand translation, rotation and
configuration variables constitute the
genes
• Crossovers mixes ligand variables from
parent configurations
• Mutations randomly change variables
• Natural selection of current generation
based on fitness
• Energy scoring function determines fitness
Introducing flexibility:
Fragment Based Methods
• build small molecules inside defined
binding sites while maximizing
favorable contacts.
• De Novo methods construct new
molecules in the site.
• division into two major groups:
– Incremental construction (FlexX, Dock)
– Place & join.
Placing Fragments and Rigid
Molecules
• All rigid-body docking methods have in
common that superposition of point sets is
a fundamental sub-problem that has to be
solved efficiently:
– Geometric hashing
– Pose clustering
– Clique detection
Geometric hashing
• originates from computer vision
• Given a picture of a scene and a set
of objects within the picture, both
represented by points in 2d space,
the goal is to recognize some of the
models in the scene
Pose-Clustering
• For each triangle of receptor compute
the transformation to each ligand
matching triangle.
• Cluster transformations.
• Score the results.
Clique-Detection
•
•Nodes comprise of matches between protein and ligand
•Edges connect distance compatible pairs of nodes
•In a clique all pair of nodes are connected
Scoring Functions
• Shape & Chemical Complementary
Scores
• Empirical Scoring
• Force Field Scoring
• Knowledge-based Scoring
• Consensus Scoring
Shape & Chemical Complementary
Scores
• Divide accessible protein surface into
zones:
– Hydrophobic
– Hydrogen-bond donating
– Hydrogen-bond accepting
• Do the same for the ligand surface
• Find ligand orientation with best
complementarity score
Empirical Scoring
Scoring parameters fit to reproduce
Measured binding affinities
(FlexX, LUDI, Hammerhead)
 






+


lig
i
prot
j 
ij
i
ij
ij
ij
ij
nonbond
j
r
q
q
r
B
r
A
E c
6
12
Force Field Scoring (Dock)
Nonbonding interactions (ligand-protein):
-van der Waals
-electrostatics
Amber force field
Knowledge-based Scoring
Function
Free energies of molecular interactions
derived from structural information on
Protein-ligand complexes contained in PDB
   
 
l
p
ref
l
p F
P
P s
s
b
s
s ,
exp
, 

Boltzmann-Like Statistics of Interatomic
Contacts.
Distribution of interatomic distances is converted
into energy functions by inverting Boltzmann’s law.
Potential of Mean Force (PMF)
   
 










ij
seg
i
corr
Vol
B
ij
r
r
f
T
k
r
A ij
bulk
s
s
_
ln
 
r
ij
seg
s Number density of atom pairs of type ij
at atom pair distance r
ij
bulk
s Number density of atom pairs of type ij
in reference sphere with radius R
Consensus Scoring
Cscore:
Integrate multiple scoring functions to
produce a consensus score that is
more accurate than any single function
for predicting binding affinity.
Virtual screening by Docking
• Find weak binders in pool of non-
binders
• Many false positives (96-100%)
• Consensus Scoring reduces rate of
false positives
Concluding remarks
Although the reliability of docking methods is
not so high, they can provide new suggestions for
protein-ligand interactions that otherwise
may be overlooked
Scoring functions are the Achilles’ heel
of docking programs.
False positives rates can be reduced using several
scoring functions in a consensus-scoring strategy
Docking programs
• DOCK
• FlexX
• GOLD
• AutoDOCK
• Hammerhead
• FLOG
FLEXX
• Receptor is treated as rigid
• Incremental construction algorithm:
– Break Ligand up into rigid fragments
– Dock fragments into pocket of receptor
– Reassemble ligand from fragments in low
energy conformations
How DOCK works
• Generate molecular surface of protein
Cavities in the receptor are used to
define spheres (blue); the centres
are potential locations for ligand atoms.
thioketal in the HIV1-protease active site
Sphere centres are matched to ligand
atoms, to determine possible orientations
for the ligand. 104 orientations generated
GOLD
(Genetic Optimisation
for Ligand Docking)
Performs automated docking with
full acyclic ligand flexibility, partial
cyclic ligand flexibility and partial
protein flexibility in and around
active site.
Scoring: includes H-bonding term,
pairwise dispersion potential
(hydrophobic interactions),
molecular and mechanics term for
internal energy.
Analysis shows algorithm more likely to fail if ligand is large or highly flexible,
and more likely to succeed if ligand is polar
• The GA is encoded to search for H-bonding networks first;
• Fitness function contains a term for dispersive interactions but takes no account
of desolvation, thus underestimates The Hydrophobic Effect

dock.ppt

  • 1.
  • 2.
    Docking Challenge • Identificationof the ligand’s correct binding geometry in the binding site (Binding Mode) • Observation: – Similar ligands can bind at quite different orientations in the active site.
  • 3.
    Two main tasksof Docking Tools • Sampling of conformational (Ligand) space • Scoring protein-ligand complexes
  • 4.
    • Historically thefirst approaches. • Protein and ligand fixed. • Search for the relative orientation of the two molecules with lowest energy. • FLOG (Flexible Ligands Oriented on Grid): each ligand represented by up to 25 low energy conformations. Rigid-body docking algorithms
  • 5.
    Introducing flexibility: Whole moleculedocking • Monte Carlo methods (MC) • Molecular Dynamics (MD) • Simulated Annealing (SA) • Genetic Algorithms (GA) Available in packages: AutoDock (MC,GA,SA) GOLD (GA) Sybyl (MD)
  • 6.
    Monte Carlo • Startwith configuration A (energy EA) • Make random move to configuration B (energy EB) • Accept move when: EB < EA or if EB > EA except with probability P:     kT E E P B A    exp
  • 7.
    Molecular Dynamics • force-fieldis used to calculate forces on each atom of the simulated system • following Newton mechanics, calculate accelerations and velocities from the forces. (Force = mass times acceleration) • The atoms are moved slightly with respect to a given time step
  • 8.
    Simulated Annealing Finding aglobal minimium by lowering the temperature during the Monte Carlo/MD simulation
  • 9.
    Genetic Algorithms • Ligandtranslation, rotation and configuration variables constitute the genes • Crossovers mixes ligand variables from parent configurations • Mutations randomly change variables • Natural selection of current generation based on fitness • Energy scoring function determines fitness
  • 10.
    Introducing flexibility: Fragment BasedMethods • build small molecules inside defined binding sites while maximizing favorable contacts. • De Novo methods construct new molecules in the site. • division into two major groups: – Incremental construction (FlexX, Dock) – Place & join.
  • 11.
    Placing Fragments andRigid Molecules • All rigid-body docking methods have in common that superposition of point sets is a fundamental sub-problem that has to be solved efficiently: – Geometric hashing – Pose clustering – Clique detection
  • 12.
    Geometric hashing • originatesfrom computer vision • Given a picture of a scene and a set of objects within the picture, both represented by points in 2d space, the goal is to recognize some of the models in the scene
  • 14.
    Pose-Clustering • For eachtriangle of receptor compute the transformation to each ligand matching triangle. • Cluster transformations. • Score the results.
  • 15.
    Clique-Detection • •Nodes comprise ofmatches between protein and ligand •Edges connect distance compatible pairs of nodes •In a clique all pair of nodes are connected
  • 16.
    Scoring Functions • Shape& Chemical Complementary Scores • Empirical Scoring • Force Field Scoring • Knowledge-based Scoring • Consensus Scoring
  • 17.
    Shape & ChemicalComplementary Scores • Divide accessible protein surface into zones: – Hydrophobic – Hydrogen-bond donating – Hydrogen-bond accepting • Do the same for the ligand surface • Find ligand orientation with best complementarity score
  • 18.
    Empirical Scoring Scoring parametersfit to reproduce Measured binding affinities (FlexX, LUDI, Hammerhead)
  • 19.
            +   lig i prot j  ij i ij ij ij ij nonbond j r q q r B r A Ec 6 12 Force Field Scoring (Dock) Nonbonding interactions (ligand-protein): -van der Waals -electrostatics Amber force field
  • 20.
    Knowledge-based Scoring Function Free energiesof molecular interactions derived from structural information on Protein-ligand complexes contained in PDB       l p ref l p F P P s s b s s , exp ,   Boltzmann-Like Statistics of Interatomic Contacts.
  • 21.
    Distribution of interatomicdistances is converted into energy functions by inverting Boltzmann’s law.
  • 22.
    Potential of MeanForce (PMF)                 ij seg i corr Vol B ij r r f T k r A ij bulk s s _ ln   r ij seg s Number density of atom pairs of type ij at atom pair distance r ij bulk s Number density of atom pairs of type ij in reference sphere with radius R
  • 23.
    Consensus Scoring Cscore: Integrate multiplescoring functions to produce a consensus score that is more accurate than any single function for predicting binding affinity.
  • 24.
    Virtual screening byDocking • Find weak binders in pool of non- binders • Many false positives (96-100%) • Consensus Scoring reduces rate of false positives
  • 25.
    Concluding remarks Although thereliability of docking methods is not so high, they can provide new suggestions for protein-ligand interactions that otherwise may be overlooked Scoring functions are the Achilles’ heel of docking programs. False positives rates can be reduced using several scoring functions in a consensus-scoring strategy
  • 26.
    Docking programs • DOCK •FlexX • GOLD • AutoDOCK • Hammerhead • FLOG
  • 27.
    FLEXX • Receptor istreated as rigid • Incremental construction algorithm: – Break Ligand up into rigid fragments – Dock fragments into pocket of receptor – Reassemble ligand from fragments in low energy conformations
  • 28.
    How DOCK works •Generate molecular surface of protein Cavities in the receptor are used to define spheres (blue); the centres are potential locations for ligand atoms. thioketal in the HIV1-protease active site Sphere centres are matched to ligand atoms, to determine possible orientations for the ligand. 104 orientations generated
  • 29.
    GOLD (Genetic Optimisation for LigandDocking) Performs automated docking with full acyclic ligand flexibility, partial cyclic ligand flexibility and partial protein flexibility in and around active site. Scoring: includes H-bonding term, pairwise dispersion potential (hydrophobic interactions), molecular and mechanics term for internal energy. Analysis shows algorithm more likely to fail if ligand is large or highly flexible, and more likely to succeed if ligand is polar • The GA is encoded to search for H-bonding networks first; • Fitness function contains a term for dispersive interactions but takes no account of desolvation, thus underestimates The Hydrophobic Effect