MACROMOLECULAR INTERACTIONS
By: Chartha
Gaglani
1. ELECTROSTATIC INTERACTIONS
INTRODUCTION
• An understanding of electrostatic interactions is essential for
the full development of structural bioinformatics.
• The structures of proteins and other biopolymers are being
determined at an interesting rate through structural genomics
and other efforts.
OVERVIEW OF FUNCTIONAL
ROLES OF ELECTROSTATICS
• Electrostatic interactions help to determine the structure and
flexibility of biopolymers, and the strength and kinetics of
their associations with small molecules and other biopolymers.
BRIEF HISTORY
• The importance of electrostatic interactions in protein behavior
was recognized early in the 20th century by Linderstrom- Lang
who introduced a simple spherical model for protein titrations
in 1924.
• A new era of electrostatic models was ushered in by a 1982
study by Warwicker and Watson.
THE NEED FOR MORE EFFICIENT AND
SCALABLE ELECTROSTATIC METHODS
• The era of structural bioinformatics has created an urgent need
for faster methods to solve problems in biomolecular
electrostatics.
• As the structures of more proteins and other biopolymers
become available through structural genomics and other
initiatives, there will be a corresponding need to calculate the
physical properties of these molecules to help assign them to
families and functions.
POISSON-BOLTZMAN THEORY
• The method is relatively rigorous framework for inclusion of
biomolecular topology and ionic strength effects.
INTRODUCTION TO EQUATION
• ˅ Ɛ(x)˅ Φ(x)=1/Ɛ0 ρ(x)
Where Ɛ(x) is a spatially varying dielectric coefficient,
Φ(x) is the dielectric potential
Ɛ0=8.854*10 12 F/m is the permittivity of a vacuum
Ρ(x) is the charge of density.
• The dielectric coefficient Ɛ(x) typically assumes different values
inside the solute and in the bulk solvent to reflect the relative
polarizabilities of the two media. For biomolecules in an aqueous env. Ɛ
is generally given a value of 2-20 inside the solute and a value of 80 in
the solvent.
ENERGIES
• The PBE defines an electrostatic energy that can be derived
from physical chemistry arguments.
• The free energy is a functional of the electrostatic potential as
well as atomic positions, charges and radii.
FORCES
• Poisson-Boltzmann have also found an increasingly important
role in force calculation for implicit solvent dynamics
simulations.
• In such simulations, the dynamical trajectory of a solute is
calculated without inclusion of numerous explicit solvent
molecules required for traditional molecular dynamics
simulations.
APPLICATIONS
• Thermodynamics- The application of electrostatic methods to
thermodynamic processes in biophysics necessarily involves
calculation of free energies, such applications are interested in
changes in energy associated with a particular biomolecular
transformation such as ligand binding, folding or other
conformational changes.
• Kinetics- A wide variety of kinetic processes involving
proteins are governed in large part by electrostatic interactions.
• These include the electrostatically steered binding of substrates
or macromolecules to proteins and the association of proteins
with membranes. Faster and more accurate methods for
calculating electrostatic forces are therefore critical to
computational analysis of such processes.
• Electrostatic comparisons- Effort has also been made to
pursue more informatics based approaches to the interpretation
of electrostatic properties.
• Much of this work includes identification of functionally
relevant residues in biomolecules by looking at electrostatic
destabilization.
SOFTWARE AVAILABLE
• DelPhi- solves the PBE using highly optimized finite
difference methods.
• APBS- solves the PBE using parallel multigrid and parallel
adaptive finite element methods.
• MEAD- solves the PBE using finite difference methods and
determines pKa values while incorporating conformational
flexibility of macromolecules
• UHBD- Solves the PBE using finite difference methods,
calculate binding and solvation energies, determines pKas, and
perform Brownian dynamics simulations.
• AMBER- In addition to providing explicit solvent simulation
tools, this package PBE- based implicit solvent methods in
both dynamics and free energy evaluation simulations.
• CHARMM- In addition to providing explicit solvent
simulation tools, this package implements generalized Born
and in both dynamics and free energy evaluation simulation.
• Qnifft- Poisson-Boltzmann finite difference solver developed
by Sharp lab.
2. PREDICTION OF PROTEIN-
NUCLEIC ACID INTERACTION
INTRODUCTION
• The specific recognition of nucleic acid sequences by nucleic
acid binding proteins is of critical importance to biological
function of every species.
• Given that sequence based recognition code appears
increasingly unlikely, the key to understanding the process of
protein DNA and protein- RNA recognition lies in structural,
physical and chemical details of the molecular interaction
themselves; these structural mechanisms have received
renewed attention with increased availability of high-
resolution structures of protein/nucleic acid complexes.
MOTIVATION
• It is self-evident that the ability to accurately and reliably
predict the sequence specificity of nucleic acid binding
proteins from structure would represent a great intellectual
achievement, but practical application would be substantial.
POTENTIAL FUNCTIONS FOR PROTEIN-
NUCLEIC INTERACTION
• The Molecular Dynamics simulation of nucleic acid molecules
and protein-nucleic acid complexes was routinely performed
long before the first efforts to computationally predict the
DNA/RNA recognition sequences of proteins from structure.
• MD force fields were among the first potential functions to be
applied to the problem of predicting the energetic properties of
sequence-specific protein/nucleic acid complexes.
• AMBER package was used to calculate free energy, enthalpy
and entropy maps of base-amino acids interactions, a study
followed by similar efforts from other groups.
• CAP- Catabolite Activator Protein. Demonstrates a hybrid
approach, wherein Hidden Markov Models was trained using
both binding sequence data and nucleotide roll data obtained
from MD simulation of DNA molecules and found that the use
of structural information improved the quality of binding site
predictions.
• The CHARMM package has also been successfully used,
particularly for MD simulations of the sequence dependent
flexibility of protein bound DNA.
CLASSIFICATION OF METHODS FOR MODELLING
PROTEIN-NUCLEIC ACID INTERACTIONS
• Physical- Potential functions derived from physical models of
atomic interaction.
• Statistical-Potential functions derived from statistical
formalisms and parametized using data from known protein-
nucleic acid structures.
• Hybrid-Methods that incorporate both physical and statistical
components.
• Orientation-Dependant Statistical Potential Functions- The
earliest example of the application of a knowledge-based
potential to protein/nucleic acid interactions can perhaps be
found in the work of Kono and Sarai who used the geometric
regularity of the DNA helix to create local ‘reference frames’
for the nucleotides, which are then used to count the number of
amino acid alpha carbons in 3D spatial bins surrounding the
nucleotide bases.
• Distance-Dependant Statistical Potential Function- More
recently, however, several groups have demonstrated that even
a one-dimensional representation of the structural data is
sufficient to discriminate native-like protein-DNA and protein-
RNA complexes from non-native structures.
• Other Statistical Models- In particular, Ge et al. developed a
knowledge-based method fro predicting the sequence-specific
binding energies of polyamide molecules to DNA double
helices, based on the observed positions of water molecules
and amino acid atoms in known protein-DNA structures.
• This method clustered the atoms observed to hydrogen bond to
DNA bases in their structure training set and used these
structural clusters to determine 3D ellipsoid regions of likely
drug-DNA hydrogen bonding interactions.
APPLICATIONS
• Molecular Docking- An application of structural analysis that
has been successfully demonstrated is the prediction of
protein-DNA and protein-RNA interactions through
computational docking of protein and nucleic acid structures.
• Computational docking is a well-established technique in the
study of protein/protein and protein/small molecule
interactions and because it is relatively easy to conduct rigid-
body binding simulations of molecular structures, this was one
of the earliest tests applied to protein-nucleic acid systems.
• Analysis of Direct and Indirect Recognition- The
computational analysis of protein-DNA and protein-RNA
structures can be used to develop and explore hypothesis about
the mechanisms of protein-nucleic acid recognition that cannot
be easily addressed via experiment.
• Structure Refinement- Another interesting yet largely
unexplored application-particularly for statistical potentials-
lies in the refinement of protein-nucleic acid structures.
• They used this potential to refine several NMR structures of
DNA and RNA molecules and demonstrated that the potential
was to improve the quality of refined structures.
• Structure-based Genome Annotation- One of the most
compelling applications of structure-based methods for the
prediction of protein-nucleic acid interactions is also one of the
most obvious: if it were possible to accurately predict the
sequence recognition of a protein from its structure, it should
be possible to use that the structure to predict the binding sites
of protein within genome.
• Rational Protein Design- There has been an enormous
amount of research dedicated to structure-based design of
certain classes of nucleic acid binding proteins particularly for
zinc finger proteins.
• There are still no general-purpose methods that can
successfully design protein sequences that will fold into
conformations with a desired potential as therapeutic
compounds for human diseases, there are clearly many
potential applications for any method that can solve this more
general problem.
FUTURE WORK AND CRITICAL CHALLENGES
• At the moment, it appears that even the simplest statistical
potentials perform as well as considerably more complex
molecular dynamics force fields in many situations: Will this
continue to be true as simulations increase in complexity and
detail, and the sampling problem of large-scale simulations
become less acute?
• As noted above, even the current best methods for simulating
protein-nucleic acid interactions can capture only small
amounts of molecular flexibility, such as protein side chain
rearrangements or small-scale molecular motions.
• New methods need to be developed to accurately sample or
simulate the motions of protein and nucleic acid molecules as
they interact.
• Moreover, these new methods must be computationally
tractable for large-scale simulations if they are to be usefully
applied to problems such as structure-based genome
annotation or protein/nucleic acid interface redesign.
• In addition to limitations in the ability to simulate molecular
motions, it is clear that even the current best models miss
important details of protein/nucleic acid interface.
3. PREDICTION OF PROTEIN-
PROTEIN INTERACTIONS FROM
EVOLUTIONARY INFORMATION
INTRODUCTION
• The study of protein interactions can be divided into two
complementary aspects, the determination of the residues or
regions implicated in the interactions and the deciphering of
the identity of the interaction partners (which proteins interact
with which ones).
• A number of bioinformatics techniques have been developed
based on the considerable amount of sequence and genomic
information that has been accumulated in the database.
EVOLUTIONARY FEATURES RELATED WITH
STRUCTURE AND FUNCTION
• Conserved Positions: The information most widely extracted
from multiple sequence alignments is related with conserved
positions.
• These invariable positions are interpreted as important residues
for the structure or function of protein since apparently
changes are disallowed during evolution.
• Conserved positions usually are located in structural cores and
active/binding sites.
• Family-Dependent Conserved Positions- One of the first
approaches to the detection of this kind of functional residues,
implemented in the sequencespace program, was developed by
Casari, Sander and Valencia.
• Coevolving Positions- Another sequence based approach for
the prediction of protein structure and the molecular
complexes is based on the detection of correlated mutations in
multiple sequence alignments and their use as distance
constraints between residues belonging to the same or different
proteins.
• ab initio simulations is of low accuracy, these predicted
contacts have been demonstrated to be useful.
PREDICTION OF INTERACTING REGIONS
• Structure-based methods- From recent reviews Schneidman-Duhevny,
Nussinov and Wolfson and Gray an interesting effort to compare various
protein docking approaches in a blind test is organized by Janin et al.
• The results of this experiment, if enough structures of protein complexes
become available for the comparison, will be important for updating our
view of the capability of current docking methods.
• The combined data on massive protein interactions and structural
information can be mined in the search for short motifs or patterns
frequently involved in interactions. These motifs can be used for predicting
new interacting regions.
• Sequence-based methods- ab Initio methods
• We term ab initio methods in this context as methods that only use the
sequence of a protein to predict residues involved at the binding interface
for intermolecular recognition.
• This category of methods includes the detection of intrinsically
unstructured segments in the protein sequences, also termed ‘Disordered
regions’.
• Another method that uses a single sequence as primary input to identify
interacting regions is the neural network developed by Ofran and Rost.
• This NN is trained with sequence segments of interaction sites and it
correctly predicts at least one interaction site for 20% of the complexes in it
test set.
• Methods based on Multiple Sequence Alignments- ConSeq
uses a sophisticated calculation of conservation together with
predicted solvent accessibility for predicting functional
residues and binding sites.
• The relation between tree-determinant residues and interacting
surfaces has been analyzed in detail in some well-
characterized systems.
• Coevolving positions have also been used for detecting.
• Hybrid methods- methods that incorporate both structural and
sequence information to predict the protein interaction surfaces
demonstrated significant improvement from equivalent ones that use
only sequence information.
• The simplest way of combining sequence and structural information
for predicting binding regions is by simply mapping the sequence-
based predicted sites into the 3D structure of the protein to assess
whether the predicted positions have structural characteristics
expected for a functional or binding site.
• Consurf program looks for conserved residues that map in the
surface of the protein taking into account the distribution of
sequences in MSA for calculating conservation.
PREDICITION PF INTERACTION PARTNERS
• Experimental techniques and High-throughput
Applications
• Computational Methods Based On Genomic Information- In
parallel to these experimental approaches for the detection of
interacting pairs of proteins, a number of bioinformatics
techniques have been developed.
• The results of many of these methods can be accessed online at
repositories such as STRING. These data can be used for
inferring the role of a protein by identifying its potential
interactors.
• Phylogenetic Profiling- this method is based on the detection
of pairs of genes that have a similar species distribution.
• First versions, phylogenetics profiles were coded as binary
vectors with 1 coding for the presence of a given gene and 0
coding for its absence.
• Another improvement came from the incorporation of
information on the phylogeny of the species involved, together
with an evolutionary model of gene gain and loss.
• The approach has two limitations.
• First, it cannot be applied to many essential proteins, since
they are present in all organisms and hence they create non
informative profiles.
• Second, this methodology requires fully sequenced genomes in
order to accurately determine whether the given gene is
present in an organism or not.
• Conservation of Gene Neighboring- the conservation of
proximity of genes along the genome between distantly related
species to predict interaction is also being utilized by several
methods.
• The limitation of this technique is that this approach is
reserved for bacterial genomes where there is a clear tendency
for the clustering of functionally related genes in operons to
allow cotranscription.
• Inferences can only be made for eukaryotic proteins only if
homologues are clearly established in a prokaryotic organism.
• Gene Fusion- the method is related to the presence of fused genes in
various genomes.
• Marcotte et al. proposed an evolutionary hypothesis for explaining such
fusion events: if two proteins have to interact in order to perform a given
function, the concentration of the active complex would be much higher if
the two proteins are fused together than if the two proteins are separated
and hence having a rely on random diffusion to find each other and form
the active complex.
• The disadvantage however, is the range of applicability because these
fusion events, inspite of being very informative, are not very frequent.
• Similarity of Phylogenetic Trees- the observation that interacting
proteins tend to have topollogically similar phylogenetic trees has
been used by some methods to predict interaction partners.
• The hypothesis behind the approach is that the interacting proteins
would be a subject to a process of coevolution that would be
translated into a stronger than expected similarity between their
phylogenetic trees.
• This concept of similarity of trees was used to look for correct
mapping between two families of interacting proteins.
• The idea is that correct mapping will be the one maximizing the
similarity between both trees.
• One important limitation of this method is that it can only be applied
to pairs of proteins where the orthologs are found in many common
species.
• Other Sequence Based Methods- Coevolving positions have
been used for predicting interacting pairs of proteins,
extending their previous usage for detecting pairs of proteins.
• The methods described so far does not involve training, that is,
they do not ‘learn’ from examples of known interactions and
non interactions. On the contrary there are other methods that
are trained with examples such as Sprinzak ans Margalit.
• Computational Methods Based On Structural Information-
these methods are intended to predict whether the homologues
of two proteins known to interact will interact or not.
• The energetic feasibility of different complexes between
members of the Ras family and different families of Ras
effectors was evaluated using the FOLD-X program.
FUTURE TRENDS
• Currently the experimental and computational techniques for
massive characterization of these networks still have several
important drawbacks.
• A common limitation of computational methods for detecting
interaction partners is dependency on the quantity and quality
of genomic data.
• The use of structural information can help improving the
performance of current methodologies.
THANK YOU

Macromolecular interaction

  • 1.
  • 2.
  • 3.
    INTRODUCTION • An understandingof electrostatic interactions is essential for the full development of structural bioinformatics. • The structures of proteins and other biopolymers are being determined at an interesting rate through structural genomics and other efforts.
  • 4.
    OVERVIEW OF FUNCTIONAL ROLESOF ELECTROSTATICS • Electrostatic interactions help to determine the structure and flexibility of biopolymers, and the strength and kinetics of their associations with small molecules and other biopolymers.
  • 5.
    BRIEF HISTORY • Theimportance of electrostatic interactions in protein behavior was recognized early in the 20th century by Linderstrom- Lang who introduced a simple spherical model for protein titrations in 1924. • A new era of electrostatic models was ushered in by a 1982 study by Warwicker and Watson.
  • 6.
    THE NEED FORMORE EFFICIENT AND SCALABLE ELECTROSTATIC METHODS • The era of structural bioinformatics has created an urgent need for faster methods to solve problems in biomolecular electrostatics. • As the structures of more proteins and other biopolymers become available through structural genomics and other initiatives, there will be a corresponding need to calculate the physical properties of these molecules to help assign them to families and functions.
  • 7.
    POISSON-BOLTZMAN THEORY • Themethod is relatively rigorous framework for inclusion of biomolecular topology and ionic strength effects.
  • 8.
    INTRODUCTION TO EQUATION •˅ Ɛ(x)˅ Φ(x)=1/Ɛ0 ρ(x) Where Ɛ(x) is a spatially varying dielectric coefficient, Φ(x) is the dielectric potential Ɛ0=8.854*10 12 F/m is the permittivity of a vacuum Ρ(x) is the charge of density. • The dielectric coefficient Ɛ(x) typically assumes different values inside the solute and in the bulk solvent to reflect the relative polarizabilities of the two media. For biomolecules in an aqueous env. Ɛ is generally given a value of 2-20 inside the solute and a value of 80 in the solvent.
  • 9.
    ENERGIES • The PBEdefines an electrostatic energy that can be derived from physical chemistry arguments. • The free energy is a functional of the electrostatic potential as well as atomic positions, charges and radii.
  • 10.
    FORCES • Poisson-Boltzmann havealso found an increasingly important role in force calculation for implicit solvent dynamics simulations. • In such simulations, the dynamical trajectory of a solute is calculated without inclusion of numerous explicit solvent molecules required for traditional molecular dynamics simulations.
  • 11.
    APPLICATIONS • Thermodynamics- Theapplication of electrostatic methods to thermodynamic processes in biophysics necessarily involves calculation of free energies, such applications are interested in changes in energy associated with a particular biomolecular transformation such as ligand binding, folding or other conformational changes.
  • 12.
    • Kinetics- Awide variety of kinetic processes involving proteins are governed in large part by electrostatic interactions. • These include the electrostatically steered binding of substrates or macromolecules to proteins and the association of proteins with membranes. Faster and more accurate methods for calculating electrostatic forces are therefore critical to computational analysis of such processes.
  • 13.
    • Electrostatic comparisons-Effort has also been made to pursue more informatics based approaches to the interpretation of electrostatic properties. • Much of this work includes identification of functionally relevant residues in biomolecules by looking at electrostatic destabilization.
  • 14.
    SOFTWARE AVAILABLE • DelPhi-solves the PBE using highly optimized finite difference methods. • APBS- solves the PBE using parallel multigrid and parallel adaptive finite element methods. • MEAD- solves the PBE using finite difference methods and determines pKa values while incorporating conformational flexibility of macromolecules
  • 15.
    • UHBD- Solvesthe PBE using finite difference methods, calculate binding and solvation energies, determines pKas, and perform Brownian dynamics simulations. • AMBER- In addition to providing explicit solvent simulation tools, this package PBE- based implicit solvent methods in both dynamics and free energy evaluation simulations. • CHARMM- In addition to providing explicit solvent simulation tools, this package implements generalized Born and in both dynamics and free energy evaluation simulation. • Qnifft- Poisson-Boltzmann finite difference solver developed by Sharp lab.
  • 16.
    2. PREDICTION OFPROTEIN- NUCLEIC ACID INTERACTION
  • 17.
    INTRODUCTION • The specificrecognition of nucleic acid sequences by nucleic acid binding proteins is of critical importance to biological function of every species. • Given that sequence based recognition code appears increasingly unlikely, the key to understanding the process of protein DNA and protein- RNA recognition lies in structural, physical and chemical details of the molecular interaction themselves; these structural mechanisms have received renewed attention with increased availability of high- resolution structures of protein/nucleic acid complexes.
  • 18.
    MOTIVATION • It isself-evident that the ability to accurately and reliably predict the sequence specificity of nucleic acid binding proteins from structure would represent a great intellectual achievement, but practical application would be substantial.
  • 19.
    POTENTIAL FUNCTIONS FORPROTEIN- NUCLEIC INTERACTION • The Molecular Dynamics simulation of nucleic acid molecules and protein-nucleic acid complexes was routinely performed long before the first efforts to computationally predict the DNA/RNA recognition sequences of proteins from structure. • MD force fields were among the first potential functions to be applied to the problem of predicting the energetic properties of sequence-specific protein/nucleic acid complexes.
  • 20.
    • AMBER packagewas used to calculate free energy, enthalpy and entropy maps of base-amino acids interactions, a study followed by similar efforts from other groups. • CAP- Catabolite Activator Protein. Demonstrates a hybrid approach, wherein Hidden Markov Models was trained using both binding sequence data and nucleotide roll data obtained from MD simulation of DNA molecules and found that the use of structural information improved the quality of binding site predictions. • The CHARMM package has also been successfully used, particularly for MD simulations of the sequence dependent flexibility of protein bound DNA.
  • 21.
    CLASSIFICATION OF METHODSFOR MODELLING PROTEIN-NUCLEIC ACID INTERACTIONS • Physical- Potential functions derived from physical models of atomic interaction. • Statistical-Potential functions derived from statistical formalisms and parametized using data from known protein- nucleic acid structures. • Hybrid-Methods that incorporate both physical and statistical components.
  • 22.
    • Orientation-Dependant StatisticalPotential Functions- The earliest example of the application of a knowledge-based potential to protein/nucleic acid interactions can perhaps be found in the work of Kono and Sarai who used the geometric regularity of the DNA helix to create local ‘reference frames’ for the nucleotides, which are then used to count the number of amino acid alpha carbons in 3D spatial bins surrounding the nucleotide bases.
  • 23.
    • Distance-Dependant StatisticalPotential Function- More recently, however, several groups have demonstrated that even a one-dimensional representation of the structural data is sufficient to discriminate native-like protein-DNA and protein- RNA complexes from non-native structures.
  • 24.
    • Other StatisticalModels- In particular, Ge et al. developed a knowledge-based method fro predicting the sequence-specific binding energies of polyamide molecules to DNA double helices, based on the observed positions of water molecules and amino acid atoms in known protein-DNA structures. • This method clustered the atoms observed to hydrogen bond to DNA bases in their structure training set and used these structural clusters to determine 3D ellipsoid regions of likely drug-DNA hydrogen bonding interactions.
  • 25.
    APPLICATIONS • Molecular Docking-An application of structural analysis that has been successfully demonstrated is the prediction of protein-DNA and protein-RNA interactions through computational docking of protein and nucleic acid structures. • Computational docking is a well-established technique in the study of protein/protein and protein/small molecule interactions and because it is relatively easy to conduct rigid- body binding simulations of molecular structures, this was one of the earliest tests applied to protein-nucleic acid systems.
  • 26.
    • Analysis ofDirect and Indirect Recognition- The computational analysis of protein-DNA and protein-RNA structures can be used to develop and explore hypothesis about the mechanisms of protein-nucleic acid recognition that cannot be easily addressed via experiment.
  • 27.
    • Structure Refinement-Another interesting yet largely unexplored application-particularly for statistical potentials- lies in the refinement of protein-nucleic acid structures. • They used this potential to refine several NMR structures of DNA and RNA molecules and demonstrated that the potential was to improve the quality of refined structures.
  • 28.
    • Structure-based GenomeAnnotation- One of the most compelling applications of structure-based methods for the prediction of protein-nucleic acid interactions is also one of the most obvious: if it were possible to accurately predict the sequence recognition of a protein from its structure, it should be possible to use that the structure to predict the binding sites of protein within genome.
  • 29.
    • Rational ProteinDesign- There has been an enormous amount of research dedicated to structure-based design of certain classes of nucleic acid binding proteins particularly for zinc finger proteins. • There are still no general-purpose methods that can successfully design protein sequences that will fold into conformations with a desired potential as therapeutic compounds for human diseases, there are clearly many potential applications for any method that can solve this more general problem.
  • 30.
    FUTURE WORK ANDCRITICAL CHALLENGES • At the moment, it appears that even the simplest statistical potentials perform as well as considerably more complex molecular dynamics force fields in many situations: Will this continue to be true as simulations increase in complexity and detail, and the sampling problem of large-scale simulations become less acute? • As noted above, even the current best methods for simulating protein-nucleic acid interactions can capture only small amounts of molecular flexibility, such as protein side chain rearrangements or small-scale molecular motions.
  • 31.
    • New methodsneed to be developed to accurately sample or simulate the motions of protein and nucleic acid molecules as they interact. • Moreover, these new methods must be computationally tractable for large-scale simulations if they are to be usefully applied to problems such as structure-based genome annotation or protein/nucleic acid interface redesign. • In addition to limitations in the ability to simulate molecular motions, it is clear that even the current best models miss important details of protein/nucleic acid interface.
  • 32.
    3. PREDICTION OFPROTEIN- PROTEIN INTERACTIONS FROM EVOLUTIONARY INFORMATION
  • 33.
    INTRODUCTION • The studyof protein interactions can be divided into two complementary aspects, the determination of the residues or regions implicated in the interactions and the deciphering of the identity of the interaction partners (which proteins interact with which ones). • A number of bioinformatics techniques have been developed based on the considerable amount of sequence and genomic information that has been accumulated in the database.
  • 34.
    EVOLUTIONARY FEATURES RELATEDWITH STRUCTURE AND FUNCTION • Conserved Positions: The information most widely extracted from multiple sequence alignments is related with conserved positions. • These invariable positions are interpreted as important residues for the structure or function of protein since apparently changes are disallowed during evolution. • Conserved positions usually are located in structural cores and active/binding sites.
  • 35.
    • Family-Dependent ConservedPositions- One of the first approaches to the detection of this kind of functional residues, implemented in the sequencespace program, was developed by Casari, Sander and Valencia.
  • 36.
    • Coevolving Positions-Another sequence based approach for the prediction of protein structure and the molecular complexes is based on the detection of correlated mutations in multiple sequence alignments and their use as distance constraints between residues belonging to the same or different proteins. • ab initio simulations is of low accuracy, these predicted contacts have been demonstrated to be useful.
  • 37.
    PREDICTION OF INTERACTINGREGIONS • Structure-based methods- From recent reviews Schneidman-Duhevny, Nussinov and Wolfson and Gray an interesting effort to compare various protein docking approaches in a blind test is organized by Janin et al. • The results of this experiment, if enough structures of protein complexes become available for the comparison, will be important for updating our view of the capability of current docking methods. • The combined data on massive protein interactions and structural information can be mined in the search for short motifs or patterns frequently involved in interactions. These motifs can be used for predicting new interacting regions.
  • 38.
    • Sequence-based methods-ab Initio methods • We term ab initio methods in this context as methods that only use the sequence of a protein to predict residues involved at the binding interface for intermolecular recognition. • This category of methods includes the detection of intrinsically unstructured segments in the protein sequences, also termed ‘Disordered regions’. • Another method that uses a single sequence as primary input to identify interacting regions is the neural network developed by Ofran and Rost. • This NN is trained with sequence segments of interaction sites and it correctly predicts at least one interaction site for 20% of the complexes in it test set.
  • 39.
    • Methods basedon Multiple Sequence Alignments- ConSeq uses a sophisticated calculation of conservation together with predicted solvent accessibility for predicting functional residues and binding sites. • The relation between tree-determinant residues and interacting surfaces has been analyzed in detail in some well- characterized systems. • Coevolving positions have also been used for detecting.
  • 40.
    • Hybrid methods-methods that incorporate both structural and sequence information to predict the protein interaction surfaces demonstrated significant improvement from equivalent ones that use only sequence information. • The simplest way of combining sequence and structural information for predicting binding regions is by simply mapping the sequence- based predicted sites into the 3D structure of the protein to assess whether the predicted positions have structural characteristics expected for a functional or binding site. • Consurf program looks for conserved residues that map in the surface of the protein taking into account the distribution of sequences in MSA for calculating conservation.
  • 41.
    PREDICITION PF INTERACTIONPARTNERS • Experimental techniques and High-throughput Applications • Computational Methods Based On Genomic Information- In parallel to these experimental approaches for the detection of interacting pairs of proteins, a number of bioinformatics techniques have been developed. • The results of many of these methods can be accessed online at repositories such as STRING. These data can be used for inferring the role of a protein by identifying its potential interactors.
  • 42.
    • Phylogenetic Profiling-this method is based on the detection of pairs of genes that have a similar species distribution. • First versions, phylogenetics profiles were coded as binary vectors with 1 coding for the presence of a given gene and 0 coding for its absence. • Another improvement came from the incorporation of information on the phylogeny of the species involved, together with an evolutionary model of gene gain and loss.
  • 43.
    • The approachhas two limitations. • First, it cannot be applied to many essential proteins, since they are present in all organisms and hence they create non informative profiles. • Second, this methodology requires fully sequenced genomes in order to accurately determine whether the given gene is present in an organism or not.
  • 44.
    • Conservation ofGene Neighboring- the conservation of proximity of genes along the genome between distantly related species to predict interaction is also being utilized by several methods. • The limitation of this technique is that this approach is reserved for bacterial genomes where there is a clear tendency for the clustering of functionally related genes in operons to allow cotranscription. • Inferences can only be made for eukaryotic proteins only if homologues are clearly established in a prokaryotic organism.
  • 45.
    • Gene Fusion-the method is related to the presence of fused genes in various genomes. • Marcotte et al. proposed an evolutionary hypothesis for explaining such fusion events: if two proteins have to interact in order to perform a given function, the concentration of the active complex would be much higher if the two proteins are fused together than if the two proteins are separated and hence having a rely on random diffusion to find each other and form the active complex. • The disadvantage however, is the range of applicability because these fusion events, inspite of being very informative, are not very frequent.
  • 46.
    • Similarity ofPhylogenetic Trees- the observation that interacting proteins tend to have topollogically similar phylogenetic trees has been used by some methods to predict interaction partners. • The hypothesis behind the approach is that the interacting proteins would be a subject to a process of coevolution that would be translated into a stronger than expected similarity between their phylogenetic trees. • This concept of similarity of trees was used to look for correct mapping between two families of interacting proteins. • The idea is that correct mapping will be the one maximizing the similarity between both trees. • One important limitation of this method is that it can only be applied to pairs of proteins where the orthologs are found in many common species.
  • 47.
    • Other SequenceBased Methods- Coevolving positions have been used for predicting interacting pairs of proteins, extending their previous usage for detecting pairs of proteins. • The methods described so far does not involve training, that is, they do not ‘learn’ from examples of known interactions and non interactions. On the contrary there are other methods that are trained with examples such as Sprinzak ans Margalit.
  • 48.
    • Computational MethodsBased On Structural Information- these methods are intended to predict whether the homologues of two proteins known to interact will interact or not. • The energetic feasibility of different complexes between members of the Ras family and different families of Ras effectors was evaluated using the FOLD-X program.
  • 49.
    FUTURE TRENDS • Currentlythe experimental and computational techniques for massive characterization of these networks still have several important drawbacks. • A common limitation of computational methods for detecting interaction partners is dependency on the quantity and quality of genomic data. • The use of structural information can help improving the performance of current methodologies.
  • 50.