 SUBMITTED BY: PRADDUM KUMAR NAMDEV
Enroll. N0. 17140007
BSc (Hons) ZOOLOGY 6th sem.
SEQUENCE ALIGNMENT
1
 Basic concepts of sequence alignment
 what is sequence alignment?
 scoring alignment: the main principle
 sequence alignment: tools
 Types of sequence alignment
1. Global alignment
2. Local alignment
 Methods of sequence alignment
1. Dot matrix method
2. The dynamic programming algorithms
3. Word type or K-tuplet method
2
 Way of arranging the sequence of DNA,
RNA or protein regions of similarity.
 The similarity may indicate the functional,
structural and evolutionary significance of
sequence.
 The known sequence is known as
reference sequence. The unknown
sequence is called query sequence.
3
 Alignment is the task of locating
‘equivalent’ of two or more sequence to
maximise their similarity.
 NIKESH NARAYANAN (red- mismatches)
 NIGESH NARAYAN….(gaps)
 Alignment can reveal homology between
sequences.
 Similarity is descriptive term that tells about
the degree of match between the two
sequences.
 Conserved function does not always imply
similarity at the sequence level.
4
 Alignments of related sequences is expected to give
good scores compared with alignments of randomly
chosen sequences.
 The correct alignment of two related sequences
should ideally be one that gives the best score.
 In practice, the correct alignment does not
necessarily have the best score, since no “perfect”
scoring has been devised.
5
 In genome science, the BLAST, which
is developed in 1990, is the most well
known sequence alignment tool; it is
elected as one of the milestones of
DNA technology by editors of nature
genetics.
6
Based on completeness
 Global
 Local
Based on numbers
 Pair wise alignment
 Multiple sequence alignment
7
 Input: treat the two sequences as potentially
equivalent.
 Goal: identify conserved regions and
differences
 Algorithm: Needleman-Wunsch dynamic
programming.
 Applications:
 Comparing two genes with same functions(in human vs
mouse)
 Comparing two proteins with similar function
8
 Que: how similar are two sequences S1 & S2
Input : two sequences S1 and S2 over the same alphabet
Output: two sequences S1’ and S2’ of equal length
(S1,S2 are S1’,S2’ with possibly additional gaps)
Example:
• S1= GCGCATGGATTGAGCGA
• S2= TGCGCCATTGATGACC
A possible alignment:
• S1’= -GCGC-ATGGATTGAGCGA
• S2’= TGCGCCATTGAT-GACC--
9
 Input: the two sequences may or may not be related
 Goal: see whether a substring in one sequence aligns
well with substring in the other
 Algorithm: smith-Wateman dynamic programming
 Note: For local matching, overhangs at the ends are
not treated as gaps
 Applications:
 Searching for local similarities in large sequences (e.g.
newly sequenced genome)
 Looking for conserved domains or motifs in two protiens
10
Que: Find the pair of substrings in two input sequences
which have the highest similarity
Input: two sequences S1,S2 over the same alphabet
Output: two sequence S1’ & S2’ of equal length
(S1’,S2 are substrings of S1’,S2’ with possibly
additional gaps)
Example:
 S1= GCGCATGGATTGACGA
 S2= TGCGCCATTGATGACC
 A possible alignment:
S1’= ATTGA-G
S2’= ATTGATG
11
12
 Dot matrix method
 The dynamic programming(DP) algorithm
 Word or k tuple methods
13
 A dot matrix is a grid system where the similar nucleotides of
two DNA sequences are represented as dots
 It also called dot plot.
 It is a pairwise sequence alignment made in the computer
 The dots appear as colourless dots in the computer screen
 In dots matrix nucleotides of one sequence are written from
the left to right on the top row those of the other sequence
are written from the top to bottom on the left side (column
of the matrix) at every point, where the two nucleotides are
the same, a dot in the intersection of row and column
becomes a dark dot. When all these darken dots are
connected, it gives a graph called dot plot.
14
Dot matrix method is a qualitative and simple
to analyse sequence however, it takes much
time to analyse large sequences.
Dot matrix method is useful for the following
studies
1) Sequence similarity between two nucleotides sequences
or two amino acids sequences
2) Insertion of short stretches in DNA 0r amino acid
sequence.
3) Deletion of short stretches from a DNA or amino acid
sequence.
4) Repeats or inserted repeats in a DNA or amino acid
sequence.
15
 Dynamic programming method is the process of solving
problems where one needs to find the best decision one
after another.
 It was introduced by Richard bellman in 1940.
 The word programming here denotes finding an
acceptable plan of action not computer programming.
 It is useful in aligning nucleotides sequence of DNA and
amino acid sequence of proteins coded by that DNA.
 Dynamic programming is a three step process that
involves
Breaking of the problems into small subproblems
Solving subproblems using recursive methods.
Construction of optimal solutions for original problem using the optimal solutions.
16
 It is used to find an optimal alignment solution, but is
more then dynamic programming.
 This method is useful in large scale database searches to
find whether there is significant match available with the
query sequence.
 Word method is used in the database search tools FASTA
and the BLAST family.
 They identify a series of short, non-overlapping
subsequence's (words) of the query sequence.
 Then they are matched to candidate database sequences
to get result.
17
 In the FASTA method, the user defines a value k
to use as the word length to search the database.
It is slower but more sensitive at lower values of k.
they are also preferred for searches involving a
very short query sequence.
 The BLAST provides a number of algorithms
optimized for particular types of queries, for
distantly related sequence matches.
 It is a good alternative to FASTA. However, the
result are not very accurate.
18
 https://bioinfo.comav.upv.es
 https://www.uniprot.org
 http://www.genome.jp
 http://www.slideshare.com
 http://quora.com
19

Sequence alignment

  • 1.
     SUBMITTED BY:PRADDUM KUMAR NAMDEV Enroll. N0. 17140007 BSc (Hons) ZOOLOGY 6th sem. SEQUENCE ALIGNMENT 1
  • 2.
     Basic conceptsof sequence alignment  what is sequence alignment?  scoring alignment: the main principle  sequence alignment: tools  Types of sequence alignment 1. Global alignment 2. Local alignment  Methods of sequence alignment 1. Dot matrix method 2. The dynamic programming algorithms 3. Word type or K-tuplet method 2
  • 3.
     Way ofarranging the sequence of DNA, RNA or protein regions of similarity.  The similarity may indicate the functional, structural and evolutionary significance of sequence.  The known sequence is known as reference sequence. The unknown sequence is called query sequence. 3
  • 4.
     Alignment isthe task of locating ‘equivalent’ of two or more sequence to maximise their similarity.  NIKESH NARAYANAN (red- mismatches)  NIGESH NARAYAN….(gaps)  Alignment can reveal homology between sequences.  Similarity is descriptive term that tells about the degree of match between the two sequences.  Conserved function does not always imply similarity at the sequence level. 4
  • 5.
     Alignments ofrelated sequences is expected to give good scores compared with alignments of randomly chosen sequences.  The correct alignment of two related sequences should ideally be one that gives the best score.  In practice, the correct alignment does not necessarily have the best score, since no “perfect” scoring has been devised. 5
  • 6.
     In genomescience, the BLAST, which is developed in 1990, is the most well known sequence alignment tool; it is elected as one of the milestones of DNA technology by editors of nature genetics. 6
  • 7.
    Based on completeness Global  Local Based on numbers  Pair wise alignment  Multiple sequence alignment 7
  • 8.
     Input: treatthe two sequences as potentially equivalent.  Goal: identify conserved regions and differences  Algorithm: Needleman-Wunsch dynamic programming.  Applications:  Comparing two genes with same functions(in human vs mouse)  Comparing two proteins with similar function 8
  • 9.
     Que: howsimilar are two sequences S1 & S2 Input : two sequences S1 and S2 over the same alphabet Output: two sequences S1’ and S2’ of equal length (S1,S2 are S1’,S2’ with possibly additional gaps) Example: • S1= GCGCATGGATTGAGCGA • S2= TGCGCCATTGATGACC A possible alignment: • S1’= -GCGC-ATGGATTGAGCGA • S2’= TGCGCCATTGAT-GACC-- 9
  • 10.
     Input: thetwo sequences may or may not be related  Goal: see whether a substring in one sequence aligns well with substring in the other  Algorithm: smith-Wateman dynamic programming  Note: For local matching, overhangs at the ends are not treated as gaps  Applications:  Searching for local similarities in large sequences (e.g. newly sequenced genome)  Looking for conserved domains or motifs in two protiens 10
  • 11.
    Que: Find thepair of substrings in two input sequences which have the highest similarity Input: two sequences S1,S2 over the same alphabet Output: two sequence S1’ & S2’ of equal length (S1’,S2 are substrings of S1’,S2’ with possibly additional gaps) Example:  S1= GCGCATGGATTGACGA  S2= TGCGCCATTGATGACC  A possible alignment: S1’= ATTGA-G S2’= ATTGATG 11
  • 12.
  • 13.
     Dot matrixmethod  The dynamic programming(DP) algorithm  Word or k tuple methods 13
  • 14.
     A dotmatrix is a grid system where the similar nucleotides of two DNA sequences are represented as dots  It also called dot plot.  It is a pairwise sequence alignment made in the computer  The dots appear as colourless dots in the computer screen  In dots matrix nucleotides of one sequence are written from the left to right on the top row those of the other sequence are written from the top to bottom on the left side (column of the matrix) at every point, where the two nucleotides are the same, a dot in the intersection of row and column becomes a dark dot. When all these darken dots are connected, it gives a graph called dot plot. 14
  • 15.
    Dot matrix methodis a qualitative and simple to analyse sequence however, it takes much time to analyse large sequences. Dot matrix method is useful for the following studies 1) Sequence similarity between two nucleotides sequences or two amino acids sequences 2) Insertion of short stretches in DNA 0r amino acid sequence. 3) Deletion of short stretches from a DNA or amino acid sequence. 4) Repeats or inserted repeats in a DNA or amino acid sequence. 15
  • 16.
     Dynamic programmingmethod is the process of solving problems where one needs to find the best decision one after another.  It was introduced by Richard bellman in 1940.  The word programming here denotes finding an acceptable plan of action not computer programming.  It is useful in aligning nucleotides sequence of DNA and amino acid sequence of proteins coded by that DNA.  Dynamic programming is a three step process that involves Breaking of the problems into small subproblems Solving subproblems using recursive methods. Construction of optimal solutions for original problem using the optimal solutions. 16
  • 17.
     It isused to find an optimal alignment solution, but is more then dynamic programming.  This method is useful in large scale database searches to find whether there is significant match available with the query sequence.  Word method is used in the database search tools FASTA and the BLAST family.  They identify a series of short, non-overlapping subsequence's (words) of the query sequence.  Then they are matched to candidate database sequences to get result. 17
  • 18.
     In theFASTA method, the user defines a value k to use as the word length to search the database. It is slower but more sensitive at lower values of k. they are also preferred for searches involving a very short query sequence.  The BLAST provides a number of algorithms optimized for particular types of queries, for distantly related sequence matches.  It is a good alternative to FASTA. However, the result are not very accurate. 18
  • 19.
     https://bioinfo.comav.upv.es  https://www.uniprot.org http://www.genome.jp  http://www.slideshare.com  http://quora.com 19