-A Presentation by
Samvartika Majumdar
Int MSc Biotechnology (9th
semester)
Regd no. 1232112107
STRUCTURE ALIGNMENT
METHODS
AND
RAMACHANDRAN PLOT
INTRODUCTION :-
 Structural alignment attempts to establish
homology between two or more polymer
structures based on their shape and three-
dimensional conformation.
 In contrast to simple structural superposition,
where at least some equivalent residues of the
two structures are known, structural alignment
requires no prior knowledge of equivalent
positions of the residues.
 Structural alignment is a valuable tool for the
comparison of proteins with low sequence
similarity, where evolutionary relationships
between proteins cannot be easily detected by
 Because these alignments rely on
information about all the query sequences'
three-dimensional conformations, the
method can only be used on sequences
where these structures are known.
 Structural alignments are especially useful in
analyzing data from structural genomics
and proteomics efforts, and they can be
used as comparison points to evaluate
alignments produced by purely sequence-
based bioinformatics methods.
 The outputs of a structural alignment are a
superposition of the atomic coordinate sets
COMBINATORIAL EXTENSION
:-
 It breaks each structure in the query set into a
series of fragments that it then attempts to
reassemble into a complete alignment.
 A series of pairwise combinations of
fragments called aligned fragment pairs, or
AFPs, are used to define a similarity matrix
through which an optimal path is generated to
identify the final alignment. A number of
similarity metrics are possible.
 An alignment path is calculated as the optimal
path through the similarity matrix by linearly
progressing through the sequences and
 Extensions then proceed with the next
AFP that meets given distance criteria
restricting the alignment to low gap
sizes.
 The size of each AFP and the maximum
gap size are required input parameters
but are usually set to empirically
determined values of 8 and 30
respectively.
 The RCSB PDB has recently released
an updated version of CE and FATCAT
as part of the RCSB PDB Protein
VAST :-
 VAST, short for Vector Alignment Search Tool, is a
computer algorithm developed at NCBI and used
to identify similar protein 3-dimensional structures
("similar structures") by purely geometric
criteria, and to identify distant homologs that
cannot be recognized by sequence comparison.
 The VAST algorithm uses vectors to represent
secondary structure elements (SSEs) in protein
structures and the structural comparison step
relies on the relative orientation, length and
alignment of those vectors.
 Comparisons that align only one or two vectors
between two proteins are never considered
 Many small proteins which have fewer than three
secondary structure elements are therefore not
comparable by VAST.
 Furthermore, slight variations in a pair of
structures can occasionally result in large
differences in vector representations.
 The original VAST finds structures that are 3D
similar to individual protein molecules, or
individual 3D domains. The original-style VAST
display can be viewed by clicking on
the "Original VAST" button on any VAST+
search results page. The original VAST help
document provides an illustrated examples of
original style VAST search results, alignment
footprints.
 The original style VAST display will
provide the following information:
- list of the protein
molecules ("chains") in the query
structure, and the 3D domains that were
identified by the VAST algorithm in each
protein.
- list of the structures that are similar in
shape to any individual protein molecule
or 3D domain of your query structure,
with links to views of their sequence
alignments and 3D superpositions to the
DALI :-
 A common and popular structural alignment
method is the DALI, or distance alignment matrix
method.
 It breaks the input structures into hexapeptide
fragments and calculates a distance matrix by
evaluating the contact patterns between
successive fragments.
 Secondary structure features that involve
residues that are contiguous in sequence appear
on the matrix's main diagonal; other diagonals in
the matrix reflect spatial contacts between
residues that are not near each other in the
sequence.
 When these diagonals are parallel to the main
 When two proteins' distance matrices share the same
or similar features in approximately the same
positions, they can be said to have similar folds with
similar-length loops connecting their secondary
structure elements.
 DALI's actual alignment process requires a similarity
search after the two proteins' distance matrices are
built; this is normally conducted via a series of
overlapping submatrices of size 6x6.
 Submatrix matches are then reassembled into a final
alignment via a standard score-maximization
algorithm.
 The DALI method has also been used to construct a
database known as FSSP (Fold classification based
on Structure-Structure alignment of Proteins, or
Families of Structurally Similar Proteins) in which all
SSAP :-
 The SSAP (Sequential Structure Alignment
Program) method uses double dynamic
programming to produce a structural alignment
based on atom-to-atom vectors in structure
space.
 Instead of the alpha carbons typically used in
structural alignment, SSAP constructs its vectors
from the beta carbons for all residues except
glycine, a method which thus takes into account
the rotameric state of each residue as well as its
location along the backbone.
 SSAP works by first constructing a series of inter-
residue distance vectors between each residue
 A series of matrices are then constructed
containing the vector differences between
neighbors for each pair of residues for which
vectors were constructed.
 Dynamic programming applied to each resulting
matrix determines a series of optimal local
alignments which are then summed into a
"summary" matrix to which dynamic programming
is applied again to determine the overall structural
alignment.
 It has been applied in an all-to-all fashion to
produce a hierarchical fold classification scheme
known as CATH (Class, Architecture, Topology,
Homology),which has been used to construct
the CATH Protein Structure Classification
database.
TM – Align :-
 TM-align is an algorithm for sequence-order
independent protein structure comparisons. For
two protein structures of unknown equivalence,
TM-align first generates optimized residue-to-
residue alignment based on structural similarity
using dynamic programming iterations.
 An optimal superposition of the two structures, as
well as the TM-score value which scales the
structural similarity, will be returned.
 The template modeling score or TM-score is a
measure of similarity between two protein
structures with different tertiary structures.
 TM-score has the value in (0,1], where 1
indicates a perfect match between two structures.
Following strict statistics of structures in the PDB,
scores below 0.2 corresponds to randomly
chosen unrelated proteins whereas with a score
higher than 0.5 assume generally the same fold
in SCOP/CATH.
 where L-target and L-aligned are the lengths of
the target protein and the aligned region
respectively. d-i is the distance between the i-th
pair of residues and d-0 is a distance scale that
RAMACHANDRAN PLOT :-
 The Ramachandran
plot is a plot of the
torsional angles - phi
(φ)and psi (ψ) - of the
residues (amino
acids) contained in a
peptide.
 The plot was
developed in 1963 by
G. N. Ramachandran,
et. al. by plotting the φ
values on the x-axis
and the ψ values on
 Plotting the torsional angles in this way
graphically shows which combination of angles
are possible.
 The torsional angles of each residue in a peptide
define the geometry of its attachment to its two
adjacent residues by positioning its planar
peptide bond relative to the two adjacent planar
peptide bonds, thereby the torsional angles
determine the conformation of the residues and
the peptide.
 Many of the angle combinations, and therefore
the conformations of residues, are not possible
because of steric hindrance.
 By making a Ramachandran plot, protein
structural scientists can determine which torsional
The areas shaded dark blue represent conformations that
involve no steric overlap and thus are fully allowed; medium
blue indicates conformations allowed at the extreme limits for
unfavorable atomic contacts the; lightest blue indicates
conformation that are permissible if a little flexibility is allowed
in the dihedral angle. The yellow regions are conformations
that are not allowed.
Uses :
A Ramachandran plot can be used in two
somewhat different ways.
 One is to show in theory which values,
or conformations, of the ψ and φ angles are
possible for an amino-acid residue in a protein.
 A second is to show the empirical distribution of
datapoints observed in a single structure in usage
for structure validation, or else in a database of
many structures.

Structure alignment methods

  • 1.
    -A Presentation by SamvartikaMajumdar Int MSc Biotechnology (9th semester) Regd no. 1232112107 STRUCTURE ALIGNMENT METHODS AND RAMACHANDRAN PLOT
  • 2.
    INTRODUCTION :-  Structuralalignment attempts to establish homology between two or more polymer structures based on their shape and three- dimensional conformation.  In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no prior knowledge of equivalent positions of the residues.  Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by
  • 3.
     Because thesealignments rely on information about all the query sequences' three-dimensional conformations, the method can only be used on sequences where these structures are known.  Structural alignments are especially useful in analyzing data from structural genomics and proteomics efforts, and they can be used as comparison points to evaluate alignments produced by purely sequence- based bioinformatics methods.  The outputs of a structural alignment are a superposition of the atomic coordinate sets
  • 4.
    COMBINATORIAL EXTENSION :-  Itbreaks each structure in the query set into a series of fragments that it then attempts to reassemble into a complete alignment.  A series of pairwise combinations of fragments called aligned fragment pairs, or AFPs, are used to define a similarity matrix through which an optimal path is generated to identify the final alignment. A number of similarity metrics are possible.  An alignment path is calculated as the optimal path through the similarity matrix by linearly progressing through the sequences and
  • 5.
     Extensions thenproceed with the next AFP that meets given distance criteria restricting the alignment to low gap sizes.  The size of each AFP and the maximum gap size are required input parameters but are usually set to empirically determined values of 8 and 30 respectively.  The RCSB PDB has recently released an updated version of CE and FATCAT as part of the RCSB PDB Protein
  • 6.
    VAST :-  VAST,short for Vector Alignment Search Tool, is a computer algorithm developed at NCBI and used to identify similar protein 3-dimensional structures ("similar structures") by purely geometric criteria, and to identify distant homologs that cannot be recognized by sequence comparison.  The VAST algorithm uses vectors to represent secondary structure elements (SSEs) in protein structures and the structural comparison step relies on the relative orientation, length and alignment of those vectors.  Comparisons that align only one or two vectors between two proteins are never considered
  • 7.
     Many smallproteins which have fewer than three secondary structure elements are therefore not comparable by VAST.  Furthermore, slight variations in a pair of structures can occasionally result in large differences in vector representations.  The original VAST finds structures that are 3D similar to individual protein molecules, or individual 3D domains. The original-style VAST display can be viewed by clicking on the "Original VAST" button on any VAST+ search results page. The original VAST help document provides an illustrated examples of original style VAST search results, alignment footprints.
  • 8.
     The originalstyle VAST display will provide the following information: - list of the protein molecules ("chains") in the query structure, and the 3D domains that were identified by the VAST algorithm in each protein. - list of the structures that are similar in shape to any individual protein molecule or 3D domain of your query structure, with links to views of their sequence alignments and 3D superpositions to the
  • 9.
    DALI :-  Acommon and popular structural alignment method is the DALI, or distance alignment matrix method.  It breaks the input structures into hexapeptide fragments and calculates a distance matrix by evaluating the contact patterns between successive fragments.  Secondary structure features that involve residues that are contiguous in sequence appear on the matrix's main diagonal; other diagonals in the matrix reflect spatial contacts between residues that are not near each other in the sequence.  When these diagonals are parallel to the main
  • 10.
     When twoproteins' distance matrices share the same or similar features in approximately the same positions, they can be said to have similar folds with similar-length loops connecting their secondary structure elements.  DALI's actual alignment process requires a similarity search after the two proteins' distance matrices are built; this is normally conducted via a series of overlapping submatrices of size 6x6.  Submatrix matches are then reassembled into a final alignment via a standard score-maximization algorithm.  The DALI method has also been used to construct a database known as FSSP (Fold classification based on Structure-Structure alignment of Proteins, or Families of Structurally Similar Proteins) in which all
  • 11.
    SSAP :-  TheSSAP (Sequential Structure Alignment Program) method uses double dynamic programming to produce a structural alignment based on atom-to-atom vectors in structure space.  Instead of the alpha carbons typically used in structural alignment, SSAP constructs its vectors from the beta carbons for all residues except glycine, a method which thus takes into account the rotameric state of each residue as well as its location along the backbone.  SSAP works by first constructing a series of inter- residue distance vectors between each residue
  • 12.
     A seriesof matrices are then constructed containing the vector differences between neighbors for each pair of residues for which vectors were constructed.  Dynamic programming applied to each resulting matrix determines a series of optimal local alignments which are then summed into a "summary" matrix to which dynamic programming is applied again to determine the overall structural alignment.  It has been applied in an all-to-all fashion to produce a hierarchical fold classification scheme known as CATH (Class, Architecture, Topology, Homology),which has been used to construct the CATH Protein Structure Classification database.
  • 13.
    TM – Align:-  TM-align is an algorithm for sequence-order independent protein structure comparisons. For two protein structures of unknown equivalence, TM-align first generates optimized residue-to- residue alignment based on structural similarity using dynamic programming iterations.  An optimal superposition of the two structures, as well as the TM-score value which scales the structural similarity, will be returned.  The template modeling score or TM-score is a measure of similarity between two protein structures with different tertiary structures.
  • 14.
     TM-score hasthe value in (0,1], where 1 indicates a perfect match between two structures. Following strict statistics of structures in the PDB, scores below 0.2 corresponds to randomly chosen unrelated proteins whereas with a score higher than 0.5 assume generally the same fold in SCOP/CATH.  where L-target and L-aligned are the lengths of the target protein and the aligned region respectively. d-i is the distance between the i-th pair of residues and d-0 is a distance scale that
  • 15.
    RAMACHANDRAN PLOT :- The Ramachandran plot is a plot of the torsional angles - phi (φ)and psi (ψ) - of the residues (amino acids) contained in a peptide.  The plot was developed in 1963 by G. N. Ramachandran, et. al. by plotting the φ values on the x-axis and the ψ values on
  • 16.
     Plotting thetorsional angles in this way graphically shows which combination of angles are possible.  The torsional angles of each residue in a peptide define the geometry of its attachment to its two adjacent residues by positioning its planar peptide bond relative to the two adjacent planar peptide bonds, thereby the torsional angles determine the conformation of the residues and the peptide.  Many of the angle combinations, and therefore the conformations of residues, are not possible because of steric hindrance.  By making a Ramachandran plot, protein structural scientists can determine which torsional
  • 17.
    The areas shadeddark blue represent conformations that involve no steric overlap and thus are fully allowed; medium blue indicates conformations allowed at the extreme limits for unfavorable atomic contacts the; lightest blue indicates conformation that are permissible if a little flexibility is allowed in the dihedral angle. The yellow regions are conformations that are not allowed.
  • 18.
    Uses : A Ramachandranplot can be used in two somewhat different ways.  One is to show in theory which values, or conformations, of the ψ and φ angles are possible for an amino-acid residue in a protein.  A second is to show the empirical distribution of datapoints observed in a single structure in usage for structure validation, or else in a database of many structures.