Protdock - Aatu Kaapro


Published on

Published in: Education, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Protdock - Aatu Kaapro

  1. 1. Protein dockingAatu Kaapro Janne Ojanen November 27, 2002
  2. 2. AbstractMolecular docking is a study of how two or more molecular structures, forexample drug and enzyme or receptor of protein, fit together. In other words,the problem is like solving a 3-dimensional puzzle. For example, the action ofa harmful protein in human body may be prohibited by finding an inhibitor,which binds to that particular protein. Molecular docking softwares are mainlyused in drug research industry.The most important application of docking software is virtual screening. Invirtual screening the most interesting and promising molecules are selected froman existing database for further research. This places demands on the usedcomputational method; it must be fast and reliable. Another application is theresearch of molecular complexes.
  3. 3. Contents1 Introduction 3 1.1 Biological background . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Physical background . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.2 Forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.3 Other physical factors . . . . . . . . . . . . . . . . . . . . 52 Molecular docking 6 2.1 Docking methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Molecular dynamics . . . . . . . . . . . . . . . . . . . . . 7 2.1.2 Monte Carlo methods . . . . . . . . . . . . . . . . . . . . 8 2.1.3 Genetic algorithms . . . . . . . . . . . . . . . . . . . . . . 8 2.1.4 Fragment-based methods . . . . . . . . . . . . . . . . . . 8 2.1.5 Point complementary methods . . . . . . . . . . . . . . . 9 2.1.6 Distance geometry methods . . . . . . . . . . . . . . . . . 9 2.1.7 Tabu searches . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.8 Systematic searches . . . . . . . . . . . . . . . . . . . . . 10 2.2 Force field models . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 Classical force field models . . . . . . . . . . . . . . . . . 10 2.2.2 Second generation force field models . . . . . . . . . . . . 11 2.2.3 Generalized force field models . . . . . . . . . . . . . . . . 123 Software 13 3.1 AutoDock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 DOCK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 FlexX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.4 Gold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 References 17 2
  4. 4. Chapter 1Introduction1.1 Biological backgroundMolecular docking is used to predict the structure of the intermolecular complexformed between two or more molecules. The most interesting case is the protein-ligand interaction, because of its applications in medicine. Ligand is a smallmolecule, which interacts with protein’s binding sites. Binding sites are areas ofprotein known to be active in forming of compounds. There are several possiblemutual conformations in which binding may occur. These are commonly calledbinding modes.1.2 Physical background1.2.1 GeneralWhen studying the structure of matter, the most thorough way of inspectionis to apply quantum mechanics to the situation. In this case, the interactionbetween two macromolecules - the ligand and the protein - could be foundout by solving the combined Schr¨dinger equation of both systems. Possible ostates of the combined system could be achieved through this method. However,quantum mechanical approach leads into dead end nearly immediately, since itis impossible to find an explicit solution for this difficult problem. Of course,it’s possible to find a numerical solution, but it soon turns out that even thenumerically solvable quantum field models are computationally too heavy toproduce truly exploitable results.It is far more productive to apply a bit more primitive, mechanic, model. Thismeans that we need to study the quality and quantity of forces between theinteractive particles. Depending on the computational method we may assigndifferent weights to different kinds of forces. It is quite common to resort tocertain simplifications, and some of the interacting forces are not used in themodeling. 3
  5. 5. 1.2.2 ForcesIt is very common to define the interactions between particles to be the conse-quence of forces between the molecules contained by the particles. Often forcesare divided into four categories: • Forces with electrostatic origin • Forces with electrodynamic origin • Steric forces • Solvent-related forcesForces with electrostatic origin are due to the charges residing in the matter. Themost common interactions are charge-charge, charge-dipole and dipole-dipole.These forces can be computed with the basic law of Coulomb. Dependencies onthe distance are the following: • charge-charge: 1/r • charge-dipole: 1/r2 • dipole-dipole: 1/r3In addition to purely electrostatic forces there exists also those with electrody-namical background. The most widely known is probably the van der Waals-interaction. Atoms, that are normally electrically neutral, may develop an in-duced dipole moment when an external electric field is applied. Van der Waals-interaction is the force between the two induced dipoles, and it has a veryshort range. There are also forces between existing charges and induced dipoles.Range dependences are the following: • charge-induced dipole: 1/r4 • van der Waals: 1/r6Steric forces are caused by entropy. For example, in cases where entropy islimited, there may be forces to minimize the free energy of the system, that aredue to entropy.Solvent-related forces are due to the structural changes of the solvent. Thesestructural changes are generated, when ions, colloids, proteins, etc. are addedinto the structure of solvent. For example, when water is acting as a solvent, onemust take the polaric nature of water molecules into account. Water moleculesform hydrogen bonds, and for example the water mass around the studied pro-tein may turn into a highly organized structure. It is very hard to determinethe solvent-related interactions, and their modeling depends very much on theway the actual solvent is modeled.Common thing to all these forces is the electromagnetic origin. 4
  6. 6. Quantum mechanical background must be kept in mind while developing thecomputational model, since some quantum phenomena must be taken into ac-count in otherwise classic evaluation. For example, a covalent bond betweenatoms (two atoms share a common electron), is a purely quantum mechanicalphenomenon. Another quantum mechanical phenomenon, that needs to be ad-dressed is the Pauli Exclusion Principle. As simply stated as possible, PauliExclusion Principle says, that two nearby electrons may not be in the exactlysame quantum state. Exclusion Principle expresses itself in such a way, thatif the distance between two particles is very small, they experience a strongrepulsive force.1.2.3 Other physical factorsGeneric protein-protein interactions differ from protein-ligand interactions dueto the small size of ligand. Because of their large size, proteins are usuallytreated as rigid bodies. However, conformational changes in the protein andthe ligand are often necessary for a successful docking process. That is why itmust be clearly understood how drastic generalization the rigid body approachis. One of the goals in current research is to be able to use flexible proteinstructure models. 5
  7. 7. Chapter 2Molecular dockingMolecular docking can be divided into two separate problems. The search al-gorithm should create an optimum number of configurations that include theexperimentally determined binding modes. These configurations are evaluatedusing scoring functions to distinguish the experimental binding modes from allother modes explored through the searching algorithm.A rigorous searching algorithm would go through all possible binding modesbetween the two molecules. However, this is impractical due to the size ofthe search space. Consider a simple system comprised of a ligand with fourrotable bonds and six rigid-body alignment parameters and a cubic active sitemeasuring 103 ˚3 . The translational and rotational properties add up to six Adegrees of freedom. If the angles are considered in 10 degree increments andtranslational parameters on a 0.5 ˚grid there are approximately 4 × 108 rigid Abody degrees of freedom to sample, corresponding to 6 × 1014 configurations tobe searched. This would require approximately 2 000 000 years of computationaltime at a rate of 10 configurations per second. As a consequence only a smallamount of the total conformational space can be sampled, and so a balance mustbe reached between the computational expense and the amount of the searchspace examined.Some common searching algorithms include • Molecular dynamics • Monte Carlo methods • Genetic algorithms • Fragment-based methods • Point complementary methods • Distance geometry methods • Tabu searches • Systematic searches 6
  8. 8. Current docking methods utilize the scoring functions in one of two ways. Thefirst approach uses the full scoring function to rank a protein-ligand conforma-tion. The system is then modified by the search algorithm, and the same scoringfunction is again applied to rank the new structure. In the alternative approacha two stage scoring function is used. A reduced function is used in directing thesearch and a more rigorous one is then used to rank the resulting structures.Some common scoring functions are • Force-field methods • Empirical free energy scoring functions • Knowledge-based potential of mean forceOnly force-field based methdos are considered in this article.2.1 Docking methods2.1.1 Molecular dynamicsThese methods involve the calculation of solutions to Newton’s equations ofmotions. Finding the global minimum energy of a docked complex is difficultsince traversing the rugged hypersurface of a biological problem is problematic.The problem is approached using standard optimization algorithms including • direct searches, using only the potential function, impractical for large molecules, suitable only for crude optimization of small molecules far away from the minimum, e.g. simplex • gradient methods, involving the first derivative of the potential function, low convergence near the minimum, recommended for initial optimization, e.g. steepest descend • conjugate-gradient methods, history of the search influences the search direction, high computational efforts, better convergence, e.g. Fletcher- Reeves • second derivative methods, very efficient convergence, e.g. Newton-Raphson • least squares methods, good convergence but often computationally too expensive, e.g. MarquardtOften a combination of methods mentioned above is used, for example a com-bination of a gradient method for initial optimization and a conjugate-gradientmethod when nearing the minimum. 7
  9. 9. 2.1.2 Monte Carlo methodsThe Monte Carlo simulation method occupies a special place in the historyof molecular modeling, as it was the technique used to perform the first com-puter simulation of a molecular system. The expression Monte Carlo simulationseems to be extremely general and many algorithms are called by that wheneverthey contain a stochastic process or some kind of random sampling. For thoseinterested, in molecular docking the expression Monte Carlo usually means im-portance sampling or Metropolis method. The Metropolis method, which isactually a Markov chain Monte Carlo method, generates random moves to thesystem and then accepts or rejects the move based on a Boltzmann probability.The Monte Carlo methods play an important role in molecular docking but thevariety of different kinds of algorithms is too large be considered here in detail.Programs using MC methods include AutoDock, ProDock, ICM, MCDOCK,DockVision, QXP and Affinity.2.1.3 Genetic algorithmsGenetic algorithms and evolutionary programming are quite suitable for solvingdocking problems because of their usefulness in solving complex optimizationproblems. The essential idea of genetic algorithms is the evolution of a pop-ulation of possible solutions via genetic operators (mutation, crossovers andmigrations) to a final population, optimizing a predefined fitness function.The process of applying genetic algorithms starts with encoding the variables,in this case the degrees of freedom, into a ”genetic code”, e.g. binary strings.Then a random initial population of solutions is created. Genetic operatorsare then applied to this population leading to a new population. This newpopulation is then scored and ranked, and using ”the survival of the fittest”,their probabilities of getting to the next iteration round depends on their score.If the size of the population is kept constant, good solutions will occupy thepopulation. It should be noted that genetic algorithms are well suitable forparallel computing. Some programs using GAs are GOLD, AutoDock, DIVALIand DARWIN.2.1.4 Fragment-based methodsFragment based methods can be described as dividing the ligand into separateportions or fragments, docking the fragments, and finally linking these fragmentstogether. These methods require subjective decisions on the importance of thevarious functional groups in the ligand, because a good choice of base fragmentis essential for these methods. A poor choice can significantly affect the qualityof the results. The base fragment must contain the predominant interactionswith the receptor. Early algorithms required manual selection of base fragment,but this has been automated in newer implementations.Some well known programs using fragment based methods are FlexX and DOCK.These programs are covered in more detail a later chapter. 8
  10. 10. 2.1.5 Point complementary methodsThese methods are based on evaluating the shape and/or chemical complemen-tarity between interacting molecules. The interacting molecules are usuallymodeled in an easy way, for example using spheres or cubes as atoms. Theligand description is then rotated and translated to obtain maximum numberof matches between ligand and protein surfaces, minus the number of volumeoverlaps. Additional constraints may be present, for example a demand forinteracting surface normals to be approximately in opposite directions.Some algorithms use a 3D grid, which is placed over the protein and over theligand. Each grid point is then labeled either open space or inside the ligand orprotein. Then a correlation function is created and this function is optimizedusing rigid body translation and rotation. This often involves using traditionalshape recognition algorithms like Fast Fourier Transform(FFT) with Fouriercorrelation theory. A high correlation score denotes good surface complemen-tarity between the molecules. Because many of the methods were originallycreated for protein-protein docking, the rigid body assumption is usually made.This is a limitation in ligand-protein docking. However, some algorithms areaddressed to ligand-protein docking and these allow some flexibility. Examplesof programs using point complementary methods are FTDOCK, SANDOCK,FLOG and the Soft Docking algorithm.2.1.6 Distance geometry methodsMany types of structural information can be expressed as intra- or intermolec-ular distances. The distance geometry formalism allows these distances to beassembled and three-dimensional structures consistent with them to be calcu-lated. The crucial feature is that it is not possible to arbitrarily assign valuesto the inter-atomic distances in a molecule and always obtain a low-energy con-formation. Rather, the inter-atomic distances are closely interrelated and manycombinations of distances are geometrically impossible. This enables fast sam-pling of the conformational space though not always resulting in good results.An example of a program using distance geometry in docking problem is DockIt.2.1.7 Tabu searchesThese methods are based on stochastic processes, in which new states are ran-domly generated from an initial state (referred to as the current solution). Thesenew solutions are then scored and ranked in ascending order. The best new so-lution is then chosen as the new current solution and the same process is thenrepeated again. To avoid loops and ensure diversity of the current solution atabu list is used. This list acts as a memory. It contains information aboutprevious current solutions and a new solution is rejected if it reminds a previ-ous solution too much. An example of docking algorithm using tabu search isPRO LEADS. 9
  11. 11. 2.1.8 Systematic searchesThese methods systematically go through all possible conformations and repre-sent the brute force solution to the docking problem. All molecules are usuallyassumed to be rigid and interaction energy is evaluated from a force field model.Some constraints and restraints can be used to reduce the dimensionality of theproblem.2.2 Force field modelsMolecular mechanics stem from the idea, that the electrons of the atom canbe thought as fixed. Geometry of a molecule can be approximated effectivelyby taking all the interacting forces into account. Bonded interactions are de-scribed by spring forces, and non-bonded interactions are usually approximatedby potentials resembling van der Waals interaction. The desired parameters aredetermined by experimental observations. Geometry is further optimized byfinding the energy minimum.Total energy is represented by set of potential energy functions. In additionto these functions, a set of parameters is also needed to compute the totalenergy. It is worthwhile to notice that force field parameters have no meaningunless they are considered together with the potential energy functions. Thusa comparison between force field models is very difficult. In addition to thesetwo parts, information about atom types and atom charges is also required. Wealso usually need a set of rules to type atoms, generate parameters not explicitlydefined and to assign functional forms and parameters. These methods togetherform a force field. • Potential energy functions • Parameters for function terms • List of atoms and atom charges • Rules for atom-typing, parameter generation and functional form assigningForce fields are usually employed to generate accurate predictions to complexproblems by interpolating and extrapolating from relatively simple experimentalset of molecules. There are generally two approaches to force fields. They areeither very accurate with small set of molecules and compounds. They also maybe more general, in which case the accuracy is often compromised.Let’s have a bit more detailed view into some of the existing force field models.2.2.1 Classical force field modelsExamples of classical force field models include AMBER, CHARMM and CVFF.They are used mainly in biochemistry. 10
  12. 12. AMBER (Assisted Model Building with Energy Refinement)AMBER refers to two things: it may mean a set of molecular mechanic forcefields used for the simulation of biomolecules, or it may also mean a package ofmolecular simulation programs. AMBER’s set of parameters is experimentallyderived. AMBER force fields are probably the most widespread ones. AMBERis designed especially for biological macromolecules.CHARMM (Chemistry at HARvard Macromolecular Mechanics)CHARMM is a program for macromolecular dynamics. In addition to perform-ing MD using algorithms for time-stepping, long range force calculation andperiodic images, it can be used for energy minimization, normal modes andcrystal optimizations. There are several potential energy functions parameter-ized for protein, lipid and nucleic acid simulations. CHARMM also incorporatesfree energy methods for chemical and conformational free energy calculations.CVFF (Consistent Valence Force Field)CVFF has parameters that are acquired by fitting crystal and gas structures tosmall organic molecules. CVFF is designed mainly for organic materials, and itis commonly used to predict structures and compute binding energies.2.2.2 Second generation force field modelsSecond generation force fields examples include CFF and COMPASS.CFF (Consistent Force Field)CFF is a bit more complex compared to AMBER. The potential energy func-tions in CFF are expanded in order to avoid problems concerning complexity ofpotential energy surfaces. CFF also uses quantum calculations to determine theparameters for energy functions. This approach gives a great advantage overclassical models, since parameters can be determined much more accurately.Other advantages include the possibility to cover larger number of compoundsinto the force field model, and the fact that all parameters are determined thesame way (which makes the model more consistent).COMPASS (Condensed-phase Optimized Molecular Potentials forAtomistic Simulation Studies)COMPASS is another ab initio (from the beginning) force field model. LikeCFF, it also has parameters defined by quantum mechanical calculations andvalidated by empirical data. 11
  13. 13. 2.2.3 Generalized force field modelsGeneralized force fields are not as accurate as the ones presented above, but theyhave their uses. They can be applied to systems that are not covered by moreaccurate force field models. Generalized force field models are based on atomicparameters and rules to determine the explicit form of parameters. Examplesinclude ESFF and UFF.ESFF (Extensible Systematic Force Field)ESFF covers all elements up to Rn. ESFF can be used for both the organic andinorganic systems.UFF (Universal Force Field)UFF covers the whole periodic table. However, it is not very accurate, and thusit’s main application is systems that are not covered by other force fields. 12
  14. 14. Chapter 3SoftwareIn addition to the existing large number of docking programs, there are alsomany molecular mechanics programs applicable to these problems. Despite thehuge variety of available programs, no single program has been able to becomerecognized as a standard. Of course, there are some programs that are verywidely used. Nevertheless it seems that the programs are not that easy to useand require some understanding of the underlying computational principles.This leads into situations, where people are using the same program they havebeen using before though better options could be available. It also seems thatsome of the existing programs are reaching a bit more mature state, since thereseem to be an increasing number of commercial solutions available. Dockingprograms are usually sold in a package with other molecular design software.It should also be noted that the division made earlier is not very strict and manyprograms would fit into more than one category of methods. Tests have shownthat there is not a significant difference in hit rates between different programsand they all produce false alarms. Because of this, combining different searchingand scoring functions produces more reliable results. This has lead to the mostsuccessful docking programs usually being a collection of the methods described.It is also worth remembering that a molecular docking software is only as goodas its scoring function is. It does not help if we are able to create the rightconformation not but able to recognize it.Probably the best known example of rational drug design has been the HIV-1protease inhibitor. Starting with X-ray structures of HIV-1 protease, a groupof scientists at DuPont Merck used docking and molecular design softwares tosuccesfully design an inhibitor.3.1 AutoDockAutoDock uses Monte Carlo simulated annealing and Lamarckian genetic al-gorithm to create a set of possible conformations. LGA is used as a globaloptimizer and energy minimization as a local search method. Possible orien-tations are evaluated with AMBER force field model in conjunction with freeenergy scoring functions and a large set of protein-ligand complexes with known 13
  15. 15. protein-ligand constants. The newest yet unreleased version 4 should containside chain flexibility. AutoDock has more informative web pages than its com-petitors and because of its free academic license, it is a good starting point whenwondering into the world of molecular docking software.3.2 DOCKDOCK is one of the oldest and best known ligand-protein docking programs.The initial version used rigid ligands; flexibility was later incorporated via in-cremental construction of the ligand in the binding pocket. As said DOCK is afragment-based method using shape and chemical complementary methods forcreating possible orientations for the ligand. These orientations can be scoredusing three different scoring functions, however none of them contain explicithydrogen-bonding terms, solvation/desolvation terms, or hydrophobicity termsthus limiting serious use. DOCK seems to handle well apolar binding sites andis useful for fast docking, but it is not the most accurate software available.3.3 FlexXFlexX is another fragment based method using flexible ligands and rigid pro-teins. It uses MIMUMBA torsion angle database for the creation of conformers.The MIMUMBA is an interaction geometry database used to exactly describeintermolecular interaction patterns. For scoring, the Boehm function (with mi-nor adaptions necessary for docking) is applied. FlexX is introduced here topronounce the importance of scoring functions. Although FlexX and DOCKboth are fragment based methods, they produce quite different results. On thecontrary to DOCK which performs well with apolar binding sites, FlexX showstotally opposite behavior. It has a bit lower hit rate than DOCK but providesbetter estimates of Root Mean Square Distance for compounds with correctlypredicted binding mode. There is an extension of FlexX called FlexE with flexi-ble receptors which has shown to produce better results with significantly lowerrunning times.3.4 GoldGold has won a lot of new users during the last few years because of its goodresults in impartial tests. It has a good hit rate overall, however it somewhatsuffers when dealing with hydrophobic binding pockets. Gold uses genetic algo-rithm to provide docking of flexible ligand and a protein with flexible hydroxylgroups. Otherwise the protein is considered to be rigid. This makes it a goodchoice when the binding pocket contains amino acids that form hydrogen bondswith the ligand. Gold uses a scoring function that is based on favorable con-formations found in Cambridge Structural Database and on empirical resultson weak chemical interactions. The development of GOLD is currently focusedon improving the computational algorithm and adding a support for parallel 14
  16. 16. processing. GOLD has one of the most comprehensive validation test sets andis also available for use at CSC.3.5 SummaryURL:[1]:[2]:[3]:[4]:[5]:[6]:[7]:[8]:[9]:[10]:[11]:[12]:[13]:[14]:[15]:[16]:[17]:[18]:[19]:[20]:[21]: 15
  17. 17. # Name License terms Platforms Keywords1 Affinity Commercial SGI Monte Carlo meth- ods2 AutoDock Free for non-profit Unix, MacOS, GA/LGA, MC use Linux, SGI3 DOCK Free for academic Unix, Linux GA, FB use4 DockIt Commercial SGI, Sun, Linux Distance geometry5 DockVision Commercial IRIX, Linux MC, GA6 DOT Free Supercomputers (Daughter of and clusters, Unix turnip)7 FADE and Free for academic Unix, Macintosh Point complemen- PADRE use tary8 FlexiDock Commercial SGI, IRIX 6.5 GA9 FlexX Commercial Unix Fragment based10 FTDOCK Free, registration Unix Point complemen- tary, Fourier corre- lation11 GLIDE Commercial Unix, Linux, SGI, MC Sun12 GOLD Free evaluation Unix GA13 GRAMM Free, registration SGI, Sun, Alpha, Global minimums Windows, Linux of intermolecular energies14 HINT Commercial SGI, Linux, Sun, hydropathic inter- Win2000, Macin- actions tosh15 ICM ICM Lite free for SGI, Alpha, Sun, academic use, oth- Linux, WinNT erwise commercial16 LEAPFROG Commercial SGI De novo ligand de- sign tool17 Liaison Commercial Unix, Linux, SGI, Fast calculations of Sun free energy of bind- ing18 LIGPLOT Free for academic Unix Schematic dia- use grams of protein - ligand complexes19 QSite Commercial Unix, Linux, SGI, mixed quantum Sun and molecular me- chanics, hydrogen bonds, hydrophobic interactions20 Shape E-mail request Unix Structure and chemistry of molec- ular surfaces21 Situs E-mail request Unix Both rigid and flex- ible proteins 16
  18. 18. Chapter 4ReferencesC. Bissantz, G. Folkers, D. Rognan: Protein-based virtual screening of chemicaldatabases. 1. Evaluation of different docking/scoring combinations, Journal ofMedicinal Chemistry, 43: 4759-4767, 2000 Leach : Molecular modeling, principles and applications 2e, Prentice Hall,2001P. Lehtovuori, T. Nyr¨nen: Nykyaikaisen telakkaty¨l¨isen vasara: molekyylien o oatelakointiohjelma GOLD Cedarissa, @CSC, 2/2002 02.pdfRoberto Millini: Introduction to molecular mechanics, 2002 Taylor, P.J. Jewsbury & J.W. Essex: A review of protein-small moleculedocking methods, Journal of Computer-Aided Molecular Design, 16: 151-166,2002ESF Training course on Molecular Interactions: New Frontiers for Computa-tional Methods 17