Sequence Alignment

BIRLA INSTITUTE OF TECHNOLOGY MESRA,
JAIPUR CAMPUS
NAME :- NIKHIL AGRAWAL
ROLL NO :- MCA/25004/18
TOPIC:- Sequence Alignment

Sequence Alignment
 Sequence alignment is a way of arranging sequences of
DNA,RNA or protein to identify regions of similarity .The
similarity may indicate the functional , structural and
evolutionary significance of the sequence.
 The known sequence is called reference sequence . The
unknown sequence is called query sequence.

Interpretation of sequence
alignment
 Sequence alignment is useful for discovering structural, functional and
evolutionary information.
 Sequences that are very much alike may have similar secondary and 3D
structure, similar function and likely a common ancestral sequence. It is
extremely unlikely that such sequences obtained similarity by chance. For
DNA molecules with n nucleotides such probability is very low P = 4-n . For
proteins the probability even much lower P = 20-n, where n is a number of
amino acid residues.
 Large scale genome studies revealed existence of horizontal transfer of
genes and other sequences between species, which may cause similarity
between some sequences in very distant species.

Alignment
 Alignment is the task of locating “equivalent” regions of two or more
sequences to maximize their similarity
 NIKESH NARAYANAN (RED : Mismatches)
 NIGESH NARAYAN- - ( gaps )
 Alignment can reveal homology between sequences.
 Similarity is descriptive term that tells about the degree of match between
the two sequences
 Sequence similarity does not always imply a common function.
 Conserved function does not always imply similarity at the sequence level.

Scoring Alignments: The Main
Principles
 Alignments of related sequences is expected to give good
scores compared with alignments of randomly chosen
sequences.
 The correct alignment of two related sequences should ideally
be the one that gives the best score.

Classifications of sequence
alignments
 Based on Completeness
 Global
 Local
 Based on Numbers
 Pair wise alignment
 Multiple sequence Alignment

Global/local sequence alignment
1. Global alignment
 Input: treat the two sequences as potentially equivalent
 Goal: identify conserved regions and differences
 Algorithm: Needleman-Wunsch dynamic programming
 Applications:
 Comparing two genes with same function (in human vs. mouse).
 Comparing two proteins with similar function

Global/local sequence alignment
2. Local alignment
 Input: The two sequences may or may not be related
 Goal: see whether a substring in one sequence aligns well with a substring
in the other
 Algorithm: Smith-Waterman dynamic programming
 Note: for local matching, overhangs at the ends are not treated as gaps
 Applications:
 Searching for local similarities in large sequences (e.g., newly sequenced
genomes).
 Looking for conserved domains or motifs in two proteins.

Pairwise/multiple sequence
alignment
 Pairwise sequence alignment
 The process of lining up two sequences to achieve maximal levels of
identity (and conservation, for amino acid sequences) for the purpose
of assessing the degree of similarity and the possibility of homology.
 A pairwise sequence alignment is an alignment of 2 sequences
obtained by inserting gaps (“-”) such that the resulting sequences
the same length and where each pair of residues represents a
homologous position.

Pairwise/multiple sequence
alignment
 Multiple sequence alignment (MSA)
 Multiple sequence alignment (MSA) can be seen as a generalization of
Sequence Alignment - instead of aligning two sequences, n sequences are
aligned simultaneously, where n is > 2 .
 Definition: A multiple sequence alignment is an alignment of n > 2 sequences
obtained by inserting gaps (“-”) into sequences such that the resulting
sequences have all length L and can be arranged in a matrix of N rows and L
columns where each column represents a homologous position.
 To construct a multiple alignment, one may have to introduce gaps in
sequences at positions where there were no gaps in the corresponding
alignment Multiple alignments typically contain more gaps than any given
of aligned sequences.

Which algorithm to use for
database similarity search?
 BLAST > FASTA > Smith-Waterman (It is VERY SLOW
and uses a LOT OF COMPUTER POWER)
 FASTA is more sensitive, misses less homologues.
 Smith-Waterman is even more sensitive.
 BLAST(basic local alignment search tool) calculates
probabilities .
 FASTA more accurate for DNA-DNA search then
BLAST.

Method of sequence
Alignment
 Dot matrix method
 The dynamic programming (DP) algorithm
 Word or k-tuple methods

Dot matrix analysis
 A dot matrix is a grid system where the similar nucleotides of two DNA sequences
are represented as dots.
 It also called dot plots.
 It is a pairwise sequence alignment made in the computer.
 The dots appear as colorless dots in the computer screen.
 In dot matrix , nucleotides of one sequence are written from the left to right on the
top row and those of the other sequence are written from the top to bottom on the
left side (column) of the matrix . At every point, where the two nucleotides are the
same , a dot in the intersection of row and column becomes a dark dot. When all
these darken dots are connected, it gives a graph called dot plot. The line found in
the dot plot is called recurrence plot. Each dot in the plot represents a matching
nucleotide or amino acid.

Dot matrix analysis
 Dot matrix method is a qualitative and simple to analyze
sequences however ,it takes much time to analyze large
sequences.
 Dot matrix method is useful for the following studies :
 Sequence similarity between two nucleotide sequences or two
amino acid sequences.
 Insertion of short stretches in DNA or amino acid sequence.
 Deletion of short stretches from a DNA or amino acid sequence.
 Repeats or inserted repeats in a DNA or amino acid sequence.

Dot matrix analysis: two similar
sequences
 Nucleic Acids Dot Plots of genes

Dynamic Programming Method
 Dynamic programming method is the process of solving problems where one
needs to find the best decision one after another.
 It was introduced by Richard Bellman in 1940.
 The word programming here denotes finding an acceptable plan of action not
computer programming.
 It is useful in aligning nucleotide sequence of DNA and amino acid sequence
of proteins coded by that DNA .
 Dynamic programming is a three step process that involves :
1. Breaking of the problem into small subproblems.
2. Solving subproblems using recursive methods.
3. Construction of optimal solutions for original problem using the optimal solutions

Dynamic programming algorithm
for sequence alignment
 The method compares every pair of characters in the two sequences and
generates an alignment, which is the best or optimal.
 This is a highly computationally demanding method. However the latest
algorithmic improvements and ever increasing computer capacity make
possible to align a query sequence against a large database in a few minutes.
 Each alignments has its own score and it is essential to recognize that several
different alignments may have nearly identical scores, which is an indication
that the dynamic programming methods may produce more than one optimal
alignment. However intelligent manipulation of some parameters is important
and may discriminate the alignments with similar scores.
 Global alignment program is based on Needleman-Wunsch algorithm and
local alignment on Smith-Waterman. Both algorithms are derivates from the
basic dynamic programming algorithm.

Word Method or K-tuple Method
 It is used to find an optimal alignment solution, but it is more than
dynamic programming .
 This method is useful in large-scale database searches to find
whether there is significant match available with the query
sequence.
 Word method is used in the database search tools FASTA and the
BLAST family .
 They identify a series of short ,non-overlapping subsequences
(words) of the query sequence.
 Then they are matched to candidate database sequences to get
result .

Word Method or K-tuple
Method
 In the FASTA method ,the user defines a value k to use as the word length
to search the database .It is slower but more sensitive at lower values of k
.They are also preferred for searches involving a very short query sequence
.
 The BLAST provides a number of algorithms optimized for particular types
of queries ,for distantly related sequence matches.
 It is a good alternative to FASTA .However , the results are not very
accurate .

Sequence Alignment

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Sequence Alignment

Similar to Sequence Alignment (20)

More from Meghaj Mallick

More from Meghaj Mallick (20)

Recently uploaded

Recently uploaded (18)

Sequence Alignment