PILLAI ASWATHY VISWANATH
BOTANY
• An alignment is an arrangement of two or
more sequence (DNA, RNA or protein) which
shows whether the two sequence aligned are
similar or different
• Helps in inferring functional , Structural or
evolutionary relationship between the sequence
• Sequence alignment methods are used to find
the best- matching sequences
 The sequence alignment is made between
a known sequence and unknown sequence
or between two unknown sequences.
 The known sequence is called reference
sequence,the unknown sequence is called
query sequence
 Sequences that are very much alike may have
similar secondary and 3D structure, similar
function and likely a common ancestral
sequence.
 Such sequence are termed as homologous and
shares a common ancestors
 In sequence alignment,the sequence to be
compared are written one above the other.
A T C G………..1
-- T C A………...2
-2 +2 +2 -1 = 1
 there are match and mismatch characters
 To reduce mismatch a “gap’’ is added
A T C G ………..1
T C A --………...2
-1 -1 -1 -2 = -5
A T C G………..1
T -- C A ………...2
-1 -2 +2 -1 = -2
 if match = +2
 Mismatch = -1
 Gap = -2
 Very short or very similar sequences can be
aligned by hand.
 However, most interesting problems require
the alignment of lengthy, highly variable or
extremely numerous sequences that cannot be
aligned solely by human effort.
 Computational approaches to sequence
alignment
 Different computational methods,called
dynamic programming algorithems
 They are required for finding the best
alignment of the sequence
There Are Mainly Two Types Of Sequence
Alignment
Global Alignment
Local Alignment
 A general global alignment technique is
the Needleman–Wunsch algorithm, which
is based on dynamic programming.
 The Smith–Waterman algorithm is a
general local alignment method also
based on dynamic programming.
 In global alignment ,an attempt is made
to align the entire sequence ( end to end
alignment )
 It two sequences have approximately the
same length and are quite similar,they
are suitable for global alingment
 Suitable for aligning two closely related
sequences
 Global Alignment are usually done for
comparing homologous genes
 Like comparing two genes with same
function or comparing two proteins with
similar function
 Finds local regions with the highest level
of similarity between the two sequence
 Any two sequences can be locally aligned
as local alignment finds stretches of
sequence with high level of matches
without considering the alingnment of
rest of the sequence regions
 Suitable for aligning more divergent
sequence or distantly related sequence
 Sequences which are suspected to have
similarity or even dissimilar sequences
can be compared with local alignment
method. It finds local regions with high
level of similarity.
 These two algorithms make all possible pair wise
comparisons to all of the data base sequence and
find the the best alignment of sequence
 But the process is often too slow for searching
large database.some times it takes hours for a
search
 So faster algorithem,such as BLAST and FASTA
have been developed
 Blast and Fasta are two software that are used to compare biological
sequences of DNA, amino acids, proteins and nucleotides of different
species and look for the similarities.
 These algorithms were written keeping speed in mind because as the
data bank of the sequences swelled once DNA was isolated in the
laboratory by the scientists in mid 1980s there raised a need to
compare and find identical genes for further research at high speed.
 Blast is an acronym for Basic Local Alignment Search Tool and uses
localized approach in comparing the two sequences.
 Fasta is a software known as Fast A where A stands for All because it
works with the alphabet like Fast A for DNA sequencing and Fast P for
protein.
 Both Blast and Fasta are very fast in comparing any genome database
and are therefore very viable monetarily as well as in saving time.
 One of the most widely used bioinformatics software
Blast was developed in 1990 and since then have been
available to everyone at NCBI site.
 This software can be accessed by any one and can be
modified according to ones need.
 Blast is the software in which input data of a sequence
to be compared is in Fasta format and output data can
be obtained in plain text, HTML or XML.
 Blast works on the principle of searching for localized
similarities between the two sequences and after short
listing the similar sequences it searches for neighborhood
similarities.
 The software searches for high number of
similar local regions and gives the result after a
threshold value is reached.
 This process differs from earlier software in
which entire sequence was searched and
compared which took a lot of time.
 Blast is used for many purposes like DNA
mapping, comparing two identical genes in
different species, creating phylogenetic tree.
  For example, following the discovery of a previously unknown
gene in the mouse, a scientist will typically perform a BLAST
search of the human genome to see if humans carry a similar
gene;
 BLAST will identify sequences in the human genome that
resemble the mouse gene based on similarity of sequence.
 The BLAST algorithm and program were designed by Stephen
Altschul, Warren Gish, Webb Miller, Eugene Myers,
and David J. Lipman at the National Institutes of Health and
was published in the Journal of Molecular Biology in 1990 
 Fasta program was written in 1985 for comparing
protein sequences only but was later modified to
conduct searches on DNA also.
 Fasta software uses the principle of finding the
similarity between the two sequences
statistically.
 This software matches one sequence of DNA or
protein with the other by local sequence
alignment method.
 It searches for local region for similarity and
not the best match between two sequences.
 Since this software compares localized
similarities at times it can come up with a
mismatch.
 In a sequence Fasta takes a small part known as
k-tuples where tuple can be from 1 to 6 and
matches with k-tuples of other sequence and
once a threshold value of matching is reached it
comes up with the result.
 It is a program that is used to shortlist
prospects of matching sequence from a large
number for full comparison as it is very fast.
 Blast is much faster than Fasta.
 Blast is much more accurate than Fasta.
 For closely matched sequences Blast is very accurate
and for dissimilar sequence Fasta is better software.
 Blast can be modified according to the need but
Fasta cannot be modified.
 Blast has to use Fasta input format to get the output
data.
 Blast is much more versatile and widely used than
Fasta.
 Global and local sequence alignments can
be of two types:
pair wise alignemnt
multiple sequence alignemnt
 This is primarily a method for comparing
two sequence to find the best matching in
local and global alignments
 The purpose of pair wise alignment is to
find related gene or gene product in a
database of known sequence
 It is used for the identification of
sequence of unknown structure of function
 Another important use is the study of
molecular evolution.
 Multiple alignments is an alingnment that compares
more than two sequences
 Here an unknown sequence is matched with several
known sequence to reveal the relatedness of
sequences ,with out making pair wise alignment first
 A multiple alignment contains a distribution of closely
and distantly related sequences
 It provides information about the most similar regions
in the set
 Thus it is more informative about
evolutionary relationship
 This is used to build phylogenetic trees.
 It begins with the most closely related
sequence and ends the most distant
 The most commonly used multiple
alignment software is the CLUSTAL.
 Similar sequence are aligned in pairs
first and distanly related sequence are
added later
 The aligned scores thus obtained are
used to cluster the sequences to
generate the final multiple alignment

Sequencealignmentinbioinformatics 100204112518-phpapp02

  • 1.
  • 2.
    • An alignmentis an arrangement of two or more sequence (DNA, RNA or protein) which shows whether the two sequence aligned are similar or different • Helps in inferring functional , Structural or evolutionary relationship between the sequence • Sequence alignment methods are used to find the best- matching sequences
  • 3.
     The sequencealignment is made between a known sequence and unknown sequence or between two unknown sequences.  The known sequence is called reference sequence,the unknown sequence is called query sequence
  • 4.
     Sequences thatare very much alike may have similar secondary and 3D structure, similar function and likely a common ancestral sequence.  Such sequence are termed as homologous and shares a common ancestors
  • 5.
     In sequencealignment,the sequence to be compared are written one above the other. A T C G………..1 -- T C A………...2 -2 +2 +2 -1 = 1  there are match and mismatch characters  To reduce mismatch a “gap’’ is added A T C G ………..1 T C A --………...2 -1 -1 -1 -2 = -5
  • 6.
    A T CG………..1 T -- C A ………...2 -1 -2 +2 -1 = -2  if match = +2  Mismatch = -1  Gap = -2
  • 7.
     Very shortor very similar sequences can be aligned by hand.  However, most interesting problems require the alignment of lengthy, highly variable or extremely numerous sequences that cannot be aligned solely by human effort.  Computational approaches to sequence alignment
  • 8.
     Different computationalmethods,called dynamic programming algorithems  They are required for finding the best alignment of the sequence There Are Mainly Two Types Of Sequence Alignment Global Alignment Local Alignment
  • 9.
     A generalglobal alignment technique is the Needleman–Wunsch algorithm, which is based on dynamic programming.  The Smith–Waterman algorithm is a general local alignment method also based on dynamic programming.
  • 10.
     In globalalignment ,an attempt is made to align the entire sequence ( end to end alignment )  It two sequences have approximately the same length and are quite similar,they are suitable for global alingment  Suitable for aligning two closely related sequences
  • 11.
     Global Alignmentare usually done for comparing homologous genes  Like comparing two genes with same function or comparing two proteins with similar function
  • 12.
     Finds localregions with the highest level of similarity between the two sequence  Any two sequences can be locally aligned as local alignment finds stretches of sequence with high level of matches without considering the alingnment of rest of the sequence regions  Suitable for aligning more divergent sequence or distantly related sequence
  • 13.
     Sequences whichare suspected to have similarity or even dissimilar sequences can be compared with local alignment method. It finds local regions with high level of similarity.
  • 14.
     These twoalgorithms make all possible pair wise comparisons to all of the data base sequence and find the the best alignment of sequence  But the process is often too slow for searching large database.some times it takes hours for a search  So faster algorithem,such as BLAST and FASTA have been developed
  • 15.
     Blast andFasta are two software that are used to compare biological sequences of DNA, amino acids, proteins and nucleotides of different species and look for the similarities.  These algorithms were written keeping speed in mind because as the data bank of the sequences swelled once DNA was isolated in the laboratory by the scientists in mid 1980s there raised a need to compare and find identical genes for further research at high speed.  Blast is an acronym for Basic Local Alignment Search Tool and uses localized approach in comparing the two sequences.  Fasta is a software known as Fast A where A stands for All because it works with the alphabet like Fast A for DNA sequencing and Fast P for protein.  Both Blast and Fasta are very fast in comparing any genome database and are therefore very viable monetarily as well as in saving time.
  • 16.
     One ofthe most widely used bioinformatics software Blast was developed in 1990 and since then have been available to everyone at NCBI site.  This software can be accessed by any one and can be modified according to ones need.  Blast is the software in which input data of a sequence to be compared is in Fasta format and output data can be obtained in plain text, HTML or XML.  Blast works on the principle of searching for localized similarities between the two sequences and after short listing the similar sequences it searches for neighborhood similarities.
  • 17.
     The softwaresearches for high number of similar local regions and gives the result after a threshold value is reached.  This process differs from earlier software in which entire sequence was searched and compared which took a lot of time.  Blast is used for many purposes like DNA mapping, comparing two identical genes in different species, creating phylogenetic tree.
  • 18.
      For example,following the discovery of a previously unknown gene in the mouse, a scientist will typically perform a BLAST search of the human genome to see if humans carry a similar gene;  BLAST will identify sequences in the human genome that resemble the mouse gene based on similarity of sequence.  The BLAST algorithm and program were designed by Stephen Altschul, Warren Gish, Webb Miller, Eugene Myers, and David J. Lipman at the National Institutes of Health and was published in the Journal of Molecular Biology in 1990 
  • 19.
     Fasta programwas written in 1985 for comparing protein sequences only but was later modified to conduct searches on DNA also.  Fasta software uses the principle of finding the similarity between the two sequences statistically.  This software matches one sequence of DNA or protein with the other by local sequence alignment method.  It searches for local region for similarity and not the best match between two sequences.
  • 20.
     Since thissoftware compares localized similarities at times it can come up with a mismatch.  In a sequence Fasta takes a small part known as k-tuples where tuple can be from 1 to 6 and matches with k-tuples of other sequence and once a threshold value of matching is reached it comes up with the result.  It is a program that is used to shortlist prospects of matching sequence from a large number for full comparison as it is very fast.
  • 21.
     Blast ismuch faster than Fasta.  Blast is much more accurate than Fasta.  For closely matched sequences Blast is very accurate and for dissimilar sequence Fasta is better software.  Blast can be modified according to the need but Fasta cannot be modified.  Blast has to use Fasta input format to get the output data.  Blast is much more versatile and widely used than Fasta.
  • 22.
     Global andlocal sequence alignments can be of two types: pair wise alignemnt multiple sequence alignemnt
  • 23.
     This isprimarily a method for comparing two sequence to find the best matching in local and global alignments  The purpose of pair wise alignment is to find related gene or gene product in a database of known sequence  It is used for the identification of sequence of unknown structure of function  Another important use is the study of molecular evolution.
  • 24.
     Multiple alignmentsis an alingnment that compares more than two sequences  Here an unknown sequence is matched with several known sequence to reveal the relatedness of sequences ,with out making pair wise alignment first  A multiple alignment contains a distribution of closely and distantly related sequences  It provides information about the most similar regions in the set
  • 25.
     Thus itis more informative about evolutionary relationship  This is used to build phylogenetic trees.  It begins with the most closely related sequence and ends the most distant  The most commonly used multiple alignment software is the CLUSTAL.
  • 26.
     Similar sequenceare aligned in pairs first and distanly related sequence are added later  The aligned scores thus obtained are used to cluster the sequences to generate the final multiple alignment