Sequence alignment

SEQUENCE
ALIGNMENT
Global vs Local

Sequence
• A sequence in biology is the one dimensional ordering of monomers,
covalently linked with a biopolymer.
• May be also referred to as primary structure of a biological
macromolecule.
• In bioinformatics, refers to DNA, RNA or protein sequence.

Sequence alignment
• Procedure of comparing two or more sequences by searching for a
series of individual characters or character patterns that are in the
same order in the sequences.
• Two sequences are aligned by writing them across a page in two rows.
• Identical or similar characters are placed in the same column, and
non-identical characters can either be placed in same column as
mismatch or opposite a gap in the other sequence.
• In an optimal alignment, non-identical characters and gaps are placed
to bring as many identical or similar characters as possible into
vertical register.
• Sequences that can be readily aligned in this manner are said to be
similar.

Two types of sequence alignment:
–Global alignment
–Local alignment
Fig.: Distinction between Global and Local alignment of two sequences

• Global alignment
– Attempts to align the entire sequence using as many characters as possible,
upto both ends of each sequence.
– Sequences that are quite similar and approximately the same length are
suitable candidates for global alignment.
– Needleman-Wunch algorithm is used to produce global alignment between
pairs of DNA or Protein sequences.

• Local alignment
– Stretches of sequence with the highest density of matches are aligned
– Generates one or more islands of matches or subalignments in the aligned
sequences
– Suitable for aligning sequences that are similar along some of their lengths
but dissimilar in others, sequences that differ in length, or sequences that
share conserved region or domain.
– Smith-Waterman algorithm is used to produce local alignments between pairs
of DNA or protein sequences.

DynamicProgramming
• Method for solving a complex problem by breaking it down into a
collection of simpler sub-problems, solving each of these sub-problems
just once and storing their solutions ideally, using a memory based
data structure.
• Then next time the same sub-problem occurs, instead of recomputing
its solution, one simply looks up the previously computed solution,
thereby saving computation time at the expense of a modest
expenditure in storage space.

Three steps in dynamic programming:
• Initialisation
• Matrix fill (scoring)
• Traceback (alignment)

• Initialization:
– Involves creating a matrix with M+1 columns and N+1 rows where
M and N correspond to the size of the sequences to be aligned.
– The first row and the first column are initialized with scores
corresponding to gap penalties.

• Matrix fill (scoring)
– The score at each position is given as:

• Traceback (alignment)
– Traceback starts from the last block and continues till the first
block in the matrix.

Needleman-Wunch algorithm
• Based on dynamic programming.
• The optimal score at each position is calculated by adding the current
match score to previously scored positions and subtracting gap
penalties (if applicable).
• Each matrix position may have a positive or negative score or zero.
• The Needleman-Wunch algorithm will maximize the number of
matches between the sequences along the entire length of the
sequences.
• Trace back starts at the last block and ends at the first block.

Smith-Waterman algorithm
• Based on DP but modified to give high scoring local matches.
• Slightly different from Needleman-Wunch algorithm
• The main differences are:
– The scoring system must include negative scores for mismatches, and
– When a DP scoring matrix value becomes negative it is set to zero, which has
the effect of terminating any alignment up to that point.
• Traceback starts at the highest score and ends at the block containing
zero.

Sequence alignment

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Sequence alignment

Similar to Sequence alignment (20)

More from Arindam Ghosh

More from Arindam Ghosh (13)

Recently uploaded

Recently uploaded (20)

Sequence alignment