This document discusses sequence alignment and the differences between global and local alignment. It defines sequence alignment as comparing two or more sequences to find identical or similar characters in the same order. Global alignment attempts to align the entire sequences, while local alignment finds the regions of highest similarity that may only be part of the sequences. Dynamic programming is used to calculate optimal alignments through initialization of a scoring matrix, filling it, and tracing back the highest scores. The Needleman-Wunch algorithm performs global alignment, while Smith-Waterman performs local alignment by setting negative scores to zero to terminate early alignments.
Sequence
• A sequencein biology is the one dimensional ordering of monomers,
covalently linked with a biopolymer.
• May be also referred to as primary structure of a biological
macromolecule.
• In bioinformatics, refers to DNA, RNA or protein sequence.
3.
Sequence alignment
• Procedureof comparing two or more sequences by searching for a
series of individual characters or character patterns that are in the
same order in the sequences.
• Two sequences are aligned by writing them across a page in two rows.
• Identical or similar characters are placed in the same column, and
non-identical characters can either be placed in same column as
mismatch or opposite a gap in the other sequence.
• In an optimal alignment, non-identical characters and gaps are placed
to bring as many identical or similar characters as possible into
vertical register.
• Sequences that can be readily aligned in this manner are said to be
similar.
4.
Two types ofsequence alignment:
–Global alignment
–Local alignment
Fig.: Distinction between Global and Local alignment of two sequences
5.
• Global alignment
–Attempts to align the entire sequence using as many characters as possible,
upto both ends of each sequence.
– Sequences that are quite similar and approximately the same length are
suitable candidates for global alignment.
– Needleman-Wunch algorithm is used to produce global alignment between
pairs of DNA or Protein sequences.
6.
• Local alignment
–Stretches of sequence with the highest density of matches are aligned
– Generates one or more islands of matches or subalignments in the aligned
sequences
– Suitable for aligning sequences that are similar along some of their lengths
but dissimilar in others, sequences that differ in length, or sequences that
share conserved region or domain.
– Smith-Waterman algorithm is used to produce local alignments between pairs
of DNA or protein sequences.
7.
DynamicProgramming
• Method forsolving a complex problem by breaking it down into a
collection of simpler sub-problems, solving each of these sub-problems
just once and storing their solutions ideally, using a memory based
data structure.
• Then next time the same sub-problem occurs, instead of recomputing
its solution, one simply looks up the previously computed solution,
thereby saving computation time at the expense of a modest
expenditure in storage space.
8.
Three steps indynamic programming:
• Initialisation
• Matrix fill (scoring)
• Traceback (alignment)
9.
• Initialization:
– Involvescreating a matrix with M+1 columns and N+1 rows where
M and N correspond to the size of the sequences to be aligned.
– The first row and the first column are initialized with scores
corresponding to gap penalties.
11.
• Matrix fill(scoring)
– The score at each position is given as:
13.
• Traceback (alignment)
–Traceback starts from the last block and continues till the first
block in the matrix.
Needleman-Wunch algorithm
• Basedon dynamic programming.
• The optimal score at each position is calculated by adding the current
match score to previously scored positions and subtracting gap
penalties (if applicable).
• Each matrix position may have a positive or negative score or zero.
• The Needleman-Wunch algorithm will maximize the number of
matches between the sequences along the entire length of the
sequences.
• Trace back starts at the last block and ends at the first block.
16.
Smith-Waterman algorithm
• Basedon DP but modified to give high scoring local matches.
• Slightly different from Needleman-Wunch algorithm
• The main differences are:
– The scoring system must include negative scores for mismatches, and
– When a DP scoring matrix value becomes negative it is set to zero, which has
the effect of terminating any alignment up to that point.
• Traceback starts at the highest score and ends at the block containing
zero.