Structure alignment methods

-A Presentation by
Samvartika Majumdar
Int MSc Biotechnology (9th
semester)
Regd no. 1232112107
STRUCTURE ALIGNMENT
METHODS
AND
RAMACHANDRAN PLOT

INTRODUCTION :-
 Structural alignment attempts to establish
homology between two or more polymer
structures based on their shape and three-
dimensional conformation.
 In contrast to simple structural superposition,
where at least some equivalent residues of the
two structures are known, structural alignment
requires no prior knowledge of equivalent
positions of the residues.
 Structural alignment is a valuable tool for the
comparison of proteins with low sequence
similarity, where evolutionary relationships
between proteins cannot be easily detected by

 Because these alignments rely on
information about all the query sequences'
three-dimensional conformations, the
method can only be used on sequences
where these structures are known.
 Structural alignments are especially useful in
analyzing data from structural genomics
and proteomics efforts, and they can be
used as comparison points to evaluate
alignments produced by purely sequence-
based bioinformatics methods.
 The outputs of a structural alignment are a
superposition of the atomic coordinate sets

COMBINATORIAL EXTENSION
:-
 It breaks each structure in the query set into a
series of fragments that it then attempts to
reassemble into a complete alignment.
 A series of pairwise combinations of
fragments called aligned fragment pairs, or
AFPs, are used to define a similarity matrix
through which an optimal path is generated to
identify the final alignment. A number of
similarity metrics are possible.
 An alignment path is calculated as the optimal
path through the similarity matrix by linearly
progressing through the sequences and

 Extensions then proceed with the next
AFP that meets given distance criteria
restricting the alignment to low gap
sizes.
 The size of each AFP and the maximum
gap size are required input parameters
but are usually set to empirically
determined values of 8 and 30
respectively.
 The RCSB PDB has recently released
an updated version of CE and FATCAT
as part of the RCSB PDB Protein

VAST :-
 VAST, short for Vector Alignment Search Tool, is a
computer algorithm developed at NCBI and used
to identify similar protein 3-dimensional structures
("similar structures") by purely geometric
criteria, and to identify distant homologs that
cannot be recognized by sequence comparison.
 The VAST algorithm uses vectors to represent
secondary structure elements (SSEs) in protein
structures and the structural comparison step
relies on the relative orientation, length and
alignment of those vectors.
 Comparisons that align only one or two vectors
between two proteins are never considered

 Many small proteins which have fewer than three
secondary structure elements are therefore not
comparable by VAST.
 Furthermore, slight variations in a pair of
structures can occasionally result in large
differences in vector representations.
 The original VAST finds structures that are 3D
similar to individual protein molecules, or
individual 3D domains. The original-style VAST
display can be viewed by clicking on
the "Original VAST" button on any VAST+
search results page. The original VAST help
document provides an illustrated examples of
original style VAST search results, alignment
footprints.

 The original style VAST display will
provide the following information:
- list of the protein
molecules ("chains") in the query
structure, and the 3D domains that were
identified by the VAST algorithm in each
protein.
- list of the structures that are similar in
shape to any individual protein molecule
or 3D domain of your query structure,
with links to views of their sequence
alignments and 3D superpositions to the

DALI :-
 A common and popular structural alignment
method is the DALI, or distance alignment matrix
method.
 It breaks the input structures into hexapeptide
fragments and calculates a distance matrix by
evaluating the contact patterns between
successive fragments.
 Secondary structure features that involve
residues that are contiguous in sequence appear
on the matrix's main diagonal; other diagonals in
the matrix reflect spatial contacts between
residues that are not near each other in the
sequence.
 When these diagonals are parallel to the main

 When two proteins' distance matrices share the same
or similar features in approximately the same
positions, they can be said to have similar folds with
similar-length loops connecting their secondary
structure elements.
 DALI's actual alignment process requires a similarity
search after the two proteins' distance matrices are
built; this is normally conducted via a series of
overlapping submatrices of size 6x6.
 Submatrix matches are then reassembled into a final
alignment via a standard score-maximization
algorithm.
 The DALI method has also been used to construct a
database known as FSSP (Fold classification based
on Structure-Structure alignment of Proteins, or
Families of Structurally Similar Proteins) in which all

SSAP :-
 The SSAP (Sequential Structure Alignment
Program) method uses double dynamic
programming to produce a structural alignment
based on atom-to-atom vectors in structure
space.
 Instead of the alpha carbons typically used in
structural alignment, SSAP constructs its vectors
from the beta carbons for all residues except
glycine, a method which thus takes into account
the rotameric state of each residue as well as its
location along the backbone.
 SSAP works by first constructing a series of inter-
residue distance vectors between each residue

 A series of matrices are then constructed
containing the vector differences between
neighbors for each pair of residues for which
vectors were constructed.
 Dynamic programming applied to each resulting
matrix determines a series of optimal local
alignments which are then summed into a
"summary" matrix to which dynamic programming
is applied again to determine the overall structural
alignment.
 It has been applied in an all-to-all fashion to
produce a hierarchical fold classification scheme
known as CATH (Class, Architecture, Topology,
Homology),which has been used to construct
the CATH Protein Structure Classification
database.

TM – Align :-
 TM-align is an algorithm for sequence-order
independent protein structure comparisons. For
two protein structures of unknown equivalence,
TM-align first generates optimized residue-to-
residue alignment based on structural similarity
using dynamic programming iterations.
 An optimal superposition of the two structures, as
well as the TM-score value which scales the
structural similarity, will be returned.
 The template modeling score or TM-score is a
measure of similarity between two protein
structures with different tertiary structures.

 TM-score has the value in (0,1], where 1
indicates a perfect match between two structures.
Following strict statistics of structures in the PDB,
scores below 0.2 corresponds to randomly
chosen unrelated proteins whereas with a score
higher than 0.5 assume generally the same fold
in SCOP/CATH.
 where L-target and L-aligned are the lengths of
the target protein and the aligned region
respectively. d-i is the distance between the i-th
pair of residues and d-0 is a distance scale that

RAMACHANDRAN PLOT :-
 The Ramachandran
plot is a plot of the
torsional angles - phi
(φ)and psi (ψ) - of the
residues (amino
acids) contained in a
peptide.
 The plot was
developed in 1963 by
G. N. Ramachandran,
et. al. by plotting the φ
values on the x-axis
and the ψ values on

 Plotting the torsional angles in this way
graphically shows which combination of angles
are possible.
 The torsional angles of each residue in a peptide
define the geometry of its attachment to its two
adjacent residues by positioning its planar
peptide bond relative to the two adjacent planar
peptide bonds, thereby the torsional angles
determine the conformation of the residues and
the peptide.
 Many of the angle combinations, and therefore
the conformations of residues, are not possible
because of steric hindrance.
 By making a Ramachandran plot, protein
structural scientists can determine which torsional

The areas shaded dark blue represent conformations that
involve no steric overlap and thus are fully allowed; medium
blue indicates conformations allowed at the extreme limits for
unfavorable atomic contacts the; lightest blue indicates
conformation that are permissible if a little flexibility is allowed
in the dihedral angle. The yellow regions are conformations
that are not allowed.

Uses :
A Ramachandran plot can be used in two
somewhat different ways.
 One is to show in theory which values,
or conformations, of the ψ and φ angles are
possible for an amino-acid residue in a protein.
 A second is to show the empirical distribution of
datapoints observed in a single structure in usage
for structure validation, or else in a database of
many structures.

Structure alignment methods

More Related Content

What's hot

Viewers also liked

Similar to Structure alignment methods

Recently uploaded

Structure alignment methods