2. Why sequence alignment
• Lots of sequences with unknown structure and function vs. a few (but
growing number) sequences with known structure and function
• If they align, they are “similar”
• If they are similar, then they might have similar structure and/or
function. Identify conserved patterns (motifs)
• If one of them has known structure/function, then alignment of other
might yield insight about how the structure/functions works. Similar
motif content might hint to similar function
3. Principles of Sequence Alignment
• Alignment can reveal homology between sequences
• Similarity is descriptive term that tells about the degree of match
between the two sequences
• Sequence similarity does not always imply a common function
• Conserved function does not always imply similarity at the sequence
level
• Convergent evolution: sequences are highly similar, but are not
homologous
4. Pairwise alignment
• GLOBAL ALIGNMENT: the alignment is stretched over the entire sequence length to
include as many matching amino acids as possible up to and including the
sequence ends. Vertical bars between the sequences indicate the presence of
identical amino acids.
eg- Needleman-Wunsch algorithm
LOCAL ALIGNMENT: the alignment tends to stop at the end of the regions of identity
or strong similarity. A much higher priority is given to finding these local regions than
to extending the alignment to include more neighboring amino acid pairs.
eg- Smith-Waterman algorithm
5. Pairwise Sequence Alignment vs Multiple Sequence Alignment
• Pairwise Sequence Alignment is used to identify regions of similarity that may
indicate functional, structural and/or evolutionary relationships between two
biological sequences (protein or nucleic acid).
Multiple Sequence Alignment(MSA) is the alignment of three or more biological sequences
of similar length. From the output of MSA applications, homology can be inferred and the
evolutionary relationship between the sequences studied.
6. Pairwise alignment in Multiple Alignment
Method
• The most practical and widely used method in multiple
sequence alignment is the hierarchical extensions of
pairwise alignment methods.
• The principal is that multiple alignments is achieved by
successive application of pairwise methods.
7. Why we do multiple alignments?
• In order to characterize protein families, identify shared regions
of homology in a multiple sequence alignment
• Determination of the consensus sequence of several aligned
sequences.
• Help prediction of the secondary and tertiary structures of new
sequences;
• Preliminary step in molecular evolution analysis using
Phylogenetic methods for constructing phylogenetic trees.
8. Different computational approaches to
perform a multiple alignment
1. The Dynamic programming approach
2. Progressive alignment method
3. Iterative refinement method
9. 1. The Dynamic programming approach
Dynamic programming algorithms guarantee to find the optimal
alignment between two sequences. DNA and RNA alignments may use
a scoring matrix, but in practice often simply assign a positive match
score, a negative mismatch score, and a negative gap penalty. For more
than a few sequences, exact algorithms become computationally
impractical . This is why this method is not widely used
The runtime increases exponentially with the number of
sequences you want to align .
Aligning 4 sequences of hundred amino acids takes 3 days
10. 2. Progressive alignment method
• The most widely used approached to multiply sequence alignment.
• Heuristic algorithms for multiple alignment are generally used,as they
are fast.
• Progressive alignment builds up a final MSA by combining pairwise
alignment beginning with the most similar pair and progressing to the
most distantly related
• Common tools eg-CLUSTAL ,T-COFFEE,PIPE-UP
11. Progressive alignment method cont…
All Progressive alignment methods
require two stages
First stage in which the relationships
between the sequence are
represented as a tree called guide
tree
Second step in which the MSA is built
by adding the sequences sequentially
to growing MSA according to the guide
tree
13. • A popular heuristic algorithm is
CLUSTAL by Des Higgins and Paul Sharp
in 1988.
• CLUSTAL makes a global multiple
alignment using a progressive
alignment approach.
14. The CLUSTAL W processing :-
First it computes all pairwise alignment and calculates sequence similarity between pairs
1. Then align the most similar pairs of sequence (this gives us an alignment of 2
sequences called ‘profile’)
2. Align the next closest pair of sequence(or pair of profiles or sequence and profile)
3. Align the nest closest pair of sequences/profile
A property of this method
is that gap creation is
irreversible
’once a gap ,always a gap’
15. Iterative refinement method
• A set of methods to produce MSAs while reducing the
errors inherent in Progressive methods are classified as
iterative
• They work similarly to progressive methods but
repeatedly to the growing MSA
• Barton and Sternberg formulated this method for MSA
• Common iterative method used –
DIALGIN.MUSCLE,ProbCons
16. Iterative refinement cont…
• It similarly to progressive
alignment method but in this case
once or new sequence is added to
the algorithm the initially aligned
sequences are repeatedly
realigned in order to obtain the
best alignment refinement
Refineing step