1. BIRLA INSTITUTE OF TECHNOLOGY MESRA,
JAIPUR CAMPUS
NAME :- NIKHIL AGRAWAL
ROLL NO :- MCA/25004/18
TOPIC:- Sequence Alignment
2. Sequence Alignment
Sequence alignment is a way of arranging sequences of
DNA,RNA or protein to identify regions of similarity .The
similarity may indicate the functional , structural and
evolutionary significance of the sequence.
The known sequence is called reference sequence . The
unknown sequence is called query sequence.
3. Interpretation of sequence
alignment
Sequence alignment is useful for discovering structural, functional and
evolutionary information.
Sequences that are very much alike may have similar secondary and 3D
structure, similar function and likely a common ancestral sequence. It is
extremely unlikely that such sequences obtained similarity by chance. For
DNA molecules with n nucleotides such probability is very low P = 4-n . For
proteins the probability even much lower P = 20-n, where n is a number of
amino acid residues.
Large scale genome studies revealed existence of horizontal transfer of
genes and other sequences between species, which may cause similarity
between some sequences in very distant species.
4. Alignment
Alignment is the task of locating “equivalent” regions of two or more
sequences to maximize their similarity
NIKESH NARAYANAN (RED : Mismatches)
NIGESH NARAYAN- - ( gaps )
Alignment can reveal homology between sequences.
Similarity is descriptive term that tells about the degree of match between
the two sequences
Sequence similarity does not always imply a common function.
Conserved function does not always imply similarity at the sequence level.
5. Scoring Alignments: The Main
Principles
Alignments of related sequences is expected to give good
scores compared with alignments of randomly chosen
sequences.
The correct alignment of two related sequences should ideally
be the one that gives the best score.
7. Global/local sequence alignment
1. Global alignment
Input: treat the two sequences as potentially equivalent
Goal: identify conserved regions and differences
Algorithm: Needleman-Wunsch dynamic programming
Applications:
Comparing two genes with same function (in human vs. mouse).
Comparing two proteins with similar function
9. Global/local sequence alignment
2. Local alignment
Input: The two sequences may or may not be related
Goal: see whether a substring in one sequence aligns well with a substring
in the other
Algorithm: Smith-Waterman dynamic programming
Note: for local matching, overhangs at the ends are not treated as gaps
Applications:
Searching for local similarities in large sequences (e.g., newly sequenced
genomes).
Looking for conserved domains or motifs in two proteins.
11. Pairwise/multiple sequence
alignment
Pairwise sequence alignment
The process of lining up two sequences to achieve maximal levels of
identity (and conservation, for amino acid sequences) for the purpose
of assessing the degree of similarity and the possibility of homology.
A pairwise sequence alignment is an alignment of 2 sequences
obtained by inserting gaps (“-”) such that the resulting sequences
the same length and where each pair of residues represents a
homologous position.
12. Pairwise/multiple sequence
alignment
Multiple sequence alignment (MSA)
Multiple sequence alignment (MSA) can be seen as a generalization of
Sequence Alignment - instead of aligning two sequences, n sequences are
aligned simultaneously, where n is > 2 .
Definition: A multiple sequence alignment is an alignment of n > 2 sequences
obtained by inserting gaps (“-”) into sequences such that the resulting
sequences have all length L and can be arranged in a matrix of N rows and L
columns where each column represents a homologous position.
To construct a multiple alignment, one may have to introduce gaps in
sequences at positions where there were no gaps in the corresponding
alignment Multiple alignments typically contain more gaps than any given
of aligned sequences.
13. Which algorithm to use for
database similarity search?
BLAST > FASTA > Smith-Waterman (It is VERY SLOW
and uses a LOT OF COMPUTER POWER)
FASTA is more sensitive, misses less homologues.
Smith-Waterman is even more sensitive.
BLAST(basic local alignment search tool) calculates
probabilities .
FASTA more accurate for DNA-DNA search then
BLAST.
15. Dot matrix analysis
A dot matrix is a grid system where the similar nucleotides of two DNA sequences
are represented as dots.
It also called dot plots.
It is a pairwise sequence alignment made in the computer.
The dots appear as colorless dots in the computer screen.
In dot matrix , nucleotides of one sequence are written from the left to right on the
top row and those of the other sequence are written from the top to bottom on the
left side (column) of the matrix . At every point, where the two nucleotides are the
same , a dot in the intersection of row and column becomes a dark dot. When all
these darken dots are connected, it gives a graph called dot plot. The line found in
the dot plot is called recurrence plot. Each dot in the plot represents a matching
nucleotide or amino acid.
16. Dot matrix analysis
Dot matrix method is a qualitative and simple to analyze
sequences however ,it takes much time to analyze large
sequences.
Dot matrix method is useful for the following studies :
Sequence similarity between two nucleotide sequences or two
amino acid sequences.
Insertion of short stretches in DNA or amino acid sequence.
Deletion of short stretches from a DNA or amino acid sequence.
Repeats or inserted repeats in a DNA or amino acid sequence.
18. Dynamic Programming Method
Dynamic programming method is the process of solving problems where one
needs to find the best decision one after another.
It was introduced by Richard Bellman in 1940.
The word programming here denotes finding an acceptable plan of action not
computer programming.
It is useful in aligning nucleotide sequence of DNA and amino acid sequence
of proteins coded by that DNA .
Dynamic programming is a three step process that involves :
1. Breaking of the problem into small subproblems.
2. Solving subproblems using recursive methods.
3. Construction of optimal solutions for original problem using the optimal solutions
19. Dynamic programming algorithm
for sequence alignment
The method compares every pair of characters in the two sequences and
generates an alignment, which is the best or optimal.
This is a highly computationally demanding method. However the latest
algorithmic improvements and ever increasing computer capacity make
possible to align a query sequence against a large database in a few minutes.
Each alignments has its own score and it is essential to recognize that several
different alignments may have nearly identical scores, which is an indication
that the dynamic programming methods may produce more than one optimal
alignment. However intelligent manipulation of some parameters is important
and may discriminate the alignments with similar scores.
Global alignment program is based on Needleman-Wunsch algorithm and
local alignment on Smith-Waterman. Both algorithms are derivates from the
basic dynamic programming algorithm.
20. Word Method or K-tuple Method
It is used to find an optimal alignment solution, but it is more than
dynamic programming .
This method is useful in large-scale database searches to find
whether there is significant match available with the query
sequence.
Word method is used in the database search tools FASTA and the
BLAST family .
They identify a series of short ,non-overlapping subsequences
(words) of the query sequence.
Then they are matched to candidate database sequences to get
result .
21. Word Method or K-tuple
Method
In the FASTA method ,the user defines a value k to use as the word length
to search the database .It is slower but more sensitive at lower values of k
.They are also preferred for searches involving a very short query sequence
.
The BLAST provides a number of algorithms optimized for particular types
of queries ,for distantly related sequence matches.
It is a good alternative to FASTA .However , the results are not very
accurate .