Sequence Alignment Method
BY:-
Parwati Sihag
M.Sc. Biotechnology
SEQUENCE ALIGNMENT
 It is the way of arranging the
sequence of DNA, RNA, Protein to
identify regions of similarity that may
be a consequence of functional,
structural, or evolutionary relationship
between the sequence.
Global Alignment
 In global alignment, two sequences to be
aligned are assumed to be generally
similar over their entire length.
 Alignment is carried out from beginning
to end of both sequences to find the best
possible alignment across the entire
length between the two sequences.
 This method is more applicable for
aligning two closely related sequences of
roughly the same length.
Local Alignment
 Local alignment, on the other hand, does not
assume that the two sequences in question
have similarity over the entire length.
 It only finds local regions with the highest level
of similarity between the two sequences and
aligns these regions without regard for the
alignment of the rest of the sequence regions.
 This approach can be used for aligning more
divergent sequences with the goal of searching
for conserved patterns in DNA or protein
sequences. The two sequences to be aligned
can be of different lengths.
•It is simplest method of alignment.
•In pairwise alignment sequence there is a
aligning of two sequences.
•It is used in structural, functional and
evolutionary analysis of sequence.
•By pairwise alignment high accuracy result
is obtained.
•It is also used to identify homologous
sequence.
Advantage of Pairwise
alignment
Disadvantage of pairwise
alignment
•It is not useful when we align more
than two sequence.
•Pairwise alignment is difficult if we use
long sequences for alignment.
•It is also known as the dot plot method.
•It is a graphical way of comparison two
sequence in a two dimensional matrix.
•In a dot matrix two sequences to be
compared are written in the horizontal and
vertical axis of the matrix.
•The comparison is done by scanning each
residue of one sequence for similarity with
all residue in the other sequence.
DOT MATRIX METHOD
Dot matrix method
DYNAMIC PROGRAMING
METHOD
 It is the method that determines optimal
alignment by matching two sequence for
all possible pair of character between the
two sequence.
 It is similar to dot matrix as,it finds
alignment in a more quantitative way by
converting a dot matrix into scoring
matrix
Dynamic programming
MULTIPLE SEQUENCE ALIGNMENT
•It is a sequence alignment of three or more
biological sequence, generally protein, DNA, or
RNA.
•MSAs require more sophisticated methodologies
than pairwise alignment because they are more
computational complex.
•Most multiple sequence alignment program use
heuristic methods rather than global optimization.
• Because identifying the optimal alignment
between more than a few sequence of moderate
length is prohibitively computational expensive.
Advantage of multiple sequence
alignment
•MSA is used for comparing more
than two sequences.
•It is used to identify homologous
residue within sequence.
•To find out identical sequence.
Disadvantage of multiple
sequence alignment
•It is more complex method as
compare to pairwise allignment.
•It is more time consuming.
•Due to gap within the sequence it
show error.
•Low accuracy as compare to pairwise
sequence allignment.
Online tool for sequence
alignment
There are following online tool for
sequence alignment.
•BLAST
•FASTA
•CLUSTAL OMEGA
BASIC STEPS PERFORMED IN BLAST
Open NCBI SITE
All data bases (choosed gene )
Enter the name of gene(thyroid peroxidase)
Click on search
Get list of search result
Get the gene I.D and location
Click on FASTA
Obtained FASTA format and NCBI reference sequence
Run BLAST
http:/www.ncbi.nlm.nin.gov/FAS
TA
FASTA
NCBI
Enter
All databases
Select Nucleotide or Protein
Name of Protein, gene or nucleotide gene
open
Select file
BASIC STEPS INVOLVED IN FASTA
FASTA Format
Copy and paste the FASTA format file in
BLAST query file
Run BLAST
It is the most commonly used approach
to multiple sequence alignment.
It speeds up the alignment of multiple
sequence through a multistep process.
It first conducts pairwise alignment for
each possible pair of sequences using
the Needleman-Wunsch alignment and
record these similarity scores from the
pairwise comparison.
PROGRESSIVE ALIGNMENT
•The scores are then converted into
evolutionary distances to generate a
distance matrix for all the sequence
involved.
•As a result,a phylogenetic tree is
generated using the neighbor-joining
method.
•In the next step,the closest sequence
based on guide tree is aligned with the
consensus sequence using dynamic
programming.
•It is based on the idea that an optimal
solution can be found by repeatedly
modifying existing suboptimal solution.
•The procedure starts by producing a low
quality alignment and gradually improves
it by iterative realignment through well
defined procedures until no more
improvement in the alignment can be
achieved.
ITERATION ALIGNMENT
It performs multiple alignment through
two sets of iteration.
1.Outer iteration=In this an initial
random alignment is generated that is
used to derive a UPGMA tree
2.Inner iteration=In this the sequence
are randomly divided into two groups
The process is repeated over
many cycles until there is no further
improvement in the overall
alignment scores.
•If a residue match is found, a dot is
placed within the graph.
•Otherwise, the matrix position are left
bank.
•When the two sequences have
substantial regions of similarity, many
dots line up to form contiguous diagonal
lines, which reveal the sequence
alignment.
•If there are interruptions in the middle of
a diagonal line, they indicate insertion or
deletion
Parwati sihag

Parwati sihag

  • 1.
    Sequence Alignment Method BY:- ParwatiSihag M.Sc. Biotechnology
  • 2.
    SEQUENCE ALIGNMENT  Itis the way of arranging the sequence of DNA, RNA, Protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationship between the sequence.
  • 3.
    Global Alignment  Inglobal alignment, two sequences to be aligned are assumed to be generally similar over their entire length.  Alignment is carried out from beginning to end of both sequences to find the best possible alignment across the entire length between the two sequences.  This method is more applicable for aligning two closely related sequences of roughly the same length.
  • 5.
    Local Alignment  Localalignment, on the other hand, does not assume that the two sequences in question have similarity over the entire length.  It only finds local regions with the highest level of similarity between the two sequences and aligns these regions without regard for the alignment of the rest of the sequence regions.  This approach can be used for aligning more divergent sequences with the goal of searching for conserved patterns in DNA or protein sequences. The two sequences to be aligned can be of different lengths.
  • 7.
    •It is simplestmethod of alignment. •In pairwise alignment sequence there is a aligning of two sequences. •It is used in structural, functional and evolutionary analysis of sequence. •By pairwise alignment high accuracy result is obtained. •It is also used to identify homologous sequence. Advantage of Pairwise alignment
  • 8.
    Disadvantage of pairwise alignment •Itis not useful when we align more than two sequence. •Pairwise alignment is difficult if we use long sequences for alignment.
  • 9.
    •It is alsoknown as the dot plot method. •It is a graphical way of comparison two sequence in a two dimensional matrix. •In a dot matrix two sequences to be compared are written in the horizontal and vertical axis of the matrix. •The comparison is done by scanning each residue of one sequence for similarity with all residue in the other sequence. DOT MATRIX METHOD
  • 10.
  • 11.
    DYNAMIC PROGRAMING METHOD  Itis the method that determines optimal alignment by matching two sequence for all possible pair of character between the two sequence.  It is similar to dot matrix as,it finds alignment in a more quantitative way by converting a dot matrix into scoring matrix
  • 12.
  • 13.
    MULTIPLE SEQUENCE ALIGNMENT •Itis a sequence alignment of three or more biological sequence, generally protein, DNA, or RNA. •MSAs require more sophisticated methodologies than pairwise alignment because they are more computational complex. •Most multiple sequence alignment program use heuristic methods rather than global optimization. • Because identifying the optimal alignment between more than a few sequence of moderate length is prohibitively computational expensive.
  • 16.
    Advantage of multiplesequence alignment •MSA is used for comparing more than two sequences. •It is used to identify homologous residue within sequence. •To find out identical sequence.
  • 17.
    Disadvantage of multiple sequencealignment •It is more complex method as compare to pairwise allignment. •It is more time consuming. •Due to gap within the sequence it show error. •Low accuracy as compare to pairwise sequence allignment.
  • 18.
    Online tool forsequence alignment There are following online tool for sequence alignment. •BLAST •FASTA •CLUSTAL OMEGA
  • 19.
    BASIC STEPS PERFORMEDIN BLAST Open NCBI SITE All data bases (choosed gene ) Enter the name of gene(thyroid peroxidase) Click on search Get list of search result Get the gene I.D and location Click on FASTA Obtained FASTA format and NCBI reference sequence Run BLAST
  • 26.
  • 27.
    NCBI Enter All databases Select Nucleotideor Protein Name of Protein, gene or nucleotide gene open Select file BASIC STEPS INVOLVED IN FASTA
  • 28.
    FASTA Format Copy andpaste the FASTA format file in BLAST query file Run BLAST
  • 34.
    It is themost commonly used approach to multiple sequence alignment. It speeds up the alignment of multiple sequence through a multistep process. It first conducts pairwise alignment for each possible pair of sequences using the Needleman-Wunsch alignment and record these similarity scores from the pairwise comparison. PROGRESSIVE ALIGNMENT
  • 35.
    •The scores arethen converted into evolutionary distances to generate a distance matrix for all the sequence involved. •As a result,a phylogenetic tree is generated using the neighbor-joining method. •In the next step,the closest sequence based on guide tree is aligned with the consensus sequence using dynamic programming.
  • 37.
    •It is basedon the idea that an optimal solution can be found by repeatedly modifying existing suboptimal solution. •The procedure starts by producing a low quality alignment and gradually improves it by iterative realignment through well defined procedures until no more improvement in the alignment can be achieved. ITERATION ALIGNMENT
  • 39.
    It performs multiplealignment through two sets of iteration. 1.Outer iteration=In this an initial random alignment is generated that is used to derive a UPGMA tree 2.Inner iteration=In this the sequence are randomly divided into two groups The process is repeated over many cycles until there is no further improvement in the overall alignment scores.
  • 40.
    •If a residuematch is found, a dot is placed within the graph. •Otherwise, the matrix position are left bank. •When the two sequences have substantial regions of similarity, many dots line up to form contiguous diagonal lines, which reveal the sequence alignment. •If there are interruptions in the middle of a diagonal line, they indicate insertion or deletion