SlideShare a Scribd company logo
1 of 18
Needleman–Wunsch algorithm
Course: B.Sc Biochemistry
Subject: Basic of Bioinformatics
Unit: III
• The Needleman–Wunsch algorithm is an algorithm used in
bioinformatics to align protein or nucleotide sequences.
• It was one of the first applications of dynamic programming to
compare biological sequences.
• The algorithm was developed by Saul B. Needleman and Christian
D. Wunsch and published in 1970.
• The algorithm essentially divides a large problem (e.g. the full
sequence) into a series of smaller problems and uses the solutions
to the smaller problems to reconstruct a solution to the larger
problem.
• It is also sometimes referred to as the optimal matching algorithm
and the global alignment technique.
• The Needleman–Wunsch algorithm is still widely used for optimal
global alignment, particularly when the quality of the global
alignment is of the upmost importance.
Choose Scoring System
• Next we need to decide how to score how each individual pair of
letters match up. Just by looking at our two strings you may be able
to see one possible best alignment:
GCATG-CU
G-ATTACA
 We can see that letters may match, mismatch, be deleted or
inserted (indel):
• Match: The two letters are the same
• Mismatch: The two letters are differential
• Indel (INsertion or DELetion) : One letter aligns to a gap in the other
string.
• There are various ways to score these three scenarios.
• These have been outlined in the Scoring Systems section
below.
• For now we will use the simple system used by Needleman
and Wunsch;
matches are given +1,
mismatches are given -1 and
Indels(Gap) are given -1.
• The original purpose of the algorithm described by
Needleman and Wunsch was to find similarities in the
amino acid sequences of two proteins.
Figure 1: Needleman-Wunsch pairwise sequence alignment
Sequences Best Alignments
--------- ----------------------
GATTACA G-ATTACA G-ATTACA G-ATTACA
GCATGCU GCATG-CU GCA-TGCU GCAT-GCU
1.
• Needleman and Wunsch describe their algorithm explicitly
for the case when the alignment is penalized solely by the
matches and mismatches, and gaps have penalty.
• The Needleman–Wunsch algorithm is still widely used for
optimal global alignment, particularly when the quality of
the global alignment is of the upmost importance.
However, the algorithm is expensive with respect to time
and space, proportional to the product of the length of two
sequences and hence is not suitable for long sequences.
• Recent development has focused on improving the time
and space cost of the algorithm while maintaining quality.
For example, in 2013, a Fast Optimal Global Sequence
Alignment Algorithm (FOGSAA).
Smith–Waterman algorithm
• The Smith–Waterman algorithm performs local sequence alignment; that is, for
determining similar regions between two strings or nucleotide or protein
sequences. Instead of looking at the total sequence.
• The Smith–Waterman algorithm compares segments of all possible lengths and
optimizes the similarity measure.
• The algorithm was first proposed by Temple F. Smith and Michael S. Waterman in
1980.
• Like the Needleman–Wunsch algorithm, of which it is a variation, Smith–
Waterman is a dynamic programming algorithm.
• As such, it has the desirable property that it is guaranteed to find the optimal local
alignment with respect to the scoring system being used (which includes the
substitution matrix and the gap-scoring scheme).
• The main difference to the Needleman–Wunsch algorithm is that negative scoring
matrix cells are set to zero, which renders the (thus positively scoring) local
alignments visible.
• Backtracking starts at the highest scoring matrix cell and proceeds until a cell with
score zero is encountered, yielding the highest scoring local alignment.
• One does not actually implement the algorithm as described because improved
alternatives are now available that have better scaling and are more accurate.
2.
• To obtain the optimum local alignment, start with the highest value in the
matrix (i,j). Then, go backwards to one of positions (i − 1,j), (i, j − 1), and
(i − 1, j − 1) depending on the direction of movement used to construct
the matrix. This methodology is maintained until a matrix cell with zero
value is reached.
• In the example, the highest value corresponds to the cell in position (8,8).
The walk back corresponds to (8,8), (7,7), (7,6), (6,5), (5,4), (4,3), (3,2),
(2,1), (1,1), and (0,0),
• Once finished, the alignment is reconstructed as follows: Starting with the
last value, reach (i,j) using the previously calculated path.
• A diagonal jump implies there is an alignment (either a match or a
mismatch).
• A top-down jump implies there is a deletion.
• A left-right jump implies there is an insertion.
• For the example, the results are:
Sequence 1 = A-CACACTA
Sequence 2 = AGCACAC-A
• One motivation for local alignment is the difficulty of obtaining correct
alignments in regions of low similarity between distantly related biological
sequences, because mutations have added too much 'noise' over
evolutionary time to allow for a meaningful comparison of those regions.
Local alignment avoids such regions altogether and focuses on those with
a positive score, i.e. those with an evolutionary conserved signal of
similarity.
• Another motivation for using local alignments is that there is a reliable
statistical model (developed by Karlin and Altschul) for optimal local
alignments. The alignment of unrelated sequences tends to produce
optimal local alignment scores which follow an extreme value distribution.
This property allows programs to produce an expectation value for the
optimal local alignment of two sequences, which is a measure of how
often two unrelated sequences would produce an optimal local alignment
whose score is greater than or equal to the observed score. Very low
expectation values indicate that the two sequences in question might be
homologous, meaning they might share a common ancestor.
• The Smith–Waterman algorithm is fairly demanding of time: To align two
sequences of lengths m and n, O(mn) time is required. Smith–Waterman
local similarity scores can be calculated in O(m) (linear) space if only the
optimal alignment needs to be found, but naive algorithms to produce the
alignment require O(mn) space. A linear space strategy to find the best
local alignment has been described. BLAST and FASTA reduce the amount
of time required by identifying conserved regions using rapid lookup
strategies, at the cost of exactness.
BLAST
• In bioinformatics, BLAST for Basic Local Alignment Search Tool is an
algorithm for comparing primary biological sequence information, such as
the amino-acid sequences of different proteins or the nucleotides of DNA
sequences. A BLAST search enables a researcher to compare a query
sequence with a library or database of sequences, and identify library
sequences that resemble the query sequence above a certain threshold.
• Different types of BLASTs are available according to the query sequences.
For example, following the discovery of a previously unknown gene in the
mouse, a scientist will typically perform a BLAST search of the human
genome to see if humans carry a similar gene; BLAST will identify
sequences in the human genome that resemble the mouse gene based on
similarity of sequence.
• BLAST is more time-efficient than FASTA by searching only for the more
significant patterns in the sequences, yet with comparative sensitivity.
This could be further realized by understanding the algorithm of BLAST
introduced below.
Input
• Input sequences are in FASTA or Genbank format and weight matrix.
Output
• BLAST output can be delivered in a variety of formats. These formats
include HTML, plain text, and XML formatting. For NCBI's web-page, the
default format for output is HTML. When performing a BLAST on NCBI, the
results are given in a graphical format showing the hits found, a table
showing sequence identifiers for the hits with scoring related data, as well
as alignments for the sequence of interest and the hits received with
corresponding BLAST scores for these. The easiest to read and most
informative of these is probably the table.
• There are now a handful of different BLAST programs available, which can
be used depending on what one is attempting to do and what they are
working with. These different programs vary in query sequence input, the
database being searched, and what is being compared. These programs
and their details are listed below:
• BLAST is actually a family of programs (all included in the blastall
executable). These include:
1. Nucleotide-nucleotide BLAST (blastn)
• This program, given a DNA query, returns the most similar DNA sequences
from the DNA database that the user specifies.
2. Protein-protein BLAST (blastp)
• This program, given a protein query, returns the most similar protein
sequences from the protein database that the user specifies.
3. Nucleotide 6-frame translation-protein (blastx)
• This program compares the six-frame conceptual translation products of a
nucleotide query sequence (both strands) against a protein sequence
database.
4. Nucleotide 6-frame translation-nucleotide 6-frame translation (tblastx)
• This program is the slowest of the BLAST family. It translates the query
nucleotide sequence in all six possible frames and compares it against the
six-frame translations of a nucleotide sequence database. The purpose of
tblastx is to find very distant relationships between nucleotide sequences.
5. Protein-nucleotide 6-frame translation (tblastn)
• This program compares a protein query against the all six reading frames
of a nucleotide sequence database.
 Of these programs, BLASTn and BLASTp are the most commonly used
because they use direct comparisons, and do not require translations.
However, since protein sequences are better conserved evolutionarily
than nucleotide sequences, tBLASTn, tBLASTx, and BLASTx, produce more
reliable and accurate results when dealing with coding DNA. They also
enable one to be able to directly see the function of the protein sequence,
since by translating the sequence of interest before searching often gives
you annotated protein hits.
Uses of BLAST
• BLAST can be used for several purposes. These include identifying species,
locating domains, establishing phylogeny, DNA mapping, and comparison.
1. Identifying species
• With the use of BLAST, you can possibly correctly identify a species or find
homologous species. This can be useful, for example, when you are
working with a DNA sequence from an unknown species.
2. Locating domains
• When working with a protein sequence you can input it into BLAST, to
locate known domains within the sequence of interest.
3. Establishing phylogeny
• Using the results received through BLAST you can create a phylogenetic
tree using the BLAST web-page. Phylogenies based on BLAST alone are
less reliable than other purpose-built computational phylogenetic
methods, so should only be relied upon for "first pass" phylogenetic
analyses.
4. DNA mapping
• When working with a known species, and looking to sequence a gene at
an unknown location, BLAST can compare the chromosomal position of
the sequence of interest, to relevant sequences in the database(s).
5. Comparison
• When working with genes, BLAST can locate common genes in two
related species, and can be used to map annotations from one organism
to another.
Books and Web References
• Books Name :
1. Introduction To Bioinformatics by T. K. Attwood
2. BioInformatics by Sangita
3. Basic Bioinformatics by S.Ignacimuthu, s.j.
http://en.wikipedia.org/wiki/BLAST
http://en.wikipedia.org/wiki/Algorithm
Image References
1.http://upload.wikimedia.org/wikipedia/commons/3/3f/Needleman-
Wunsch_pairwise_sequence_alignment.png
2.http://upload.wikimedia.org/math/b/9/9/b99e56fa40bb943637c6891a46b
cfd26.png

More Related Content

What's hot

Ortholog assignment
Ortholog assignmentOrtholog assignment
Ortholog assignment
Melvin Zhang
 
The Smith Waterman algorithm
The Smith Waterman algorithmThe Smith Waterman algorithm
The Smith Waterman algorithm
avrilcoghlan
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformatics
avrilcoghlan
 

What's hot (20)

Sequence similarity tools.pptx
Sequence similarity tools.pptxSequence similarity tools.pptx
Sequence similarity tools.pptx
 
Dynamic programming
Dynamic programming Dynamic programming
Dynamic programming
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch Algorithm
 
Parwati sihag
Parwati sihagParwati sihag
Parwati sihag
 
Ortholog assignment
Ortholog assignmentOrtholog assignment
Ortholog assignment
 
The Smith Waterman algorithm
The Smith Waterman algorithmThe Smith Waterman algorithm
The Smith Waterman algorithm
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Global alignment
Global alignmentGlobal alignment
Global alignment
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
Phylogenetic trees
Phylogenetic treesPhylogenetic trees
Phylogenetic trees
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch Algorithm
 
Dot matrix
Dot matrixDot matrix
Dot matrix
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformatics
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
Genomic Data Analysis
Genomic Data AnalysisGenomic Data Analysis
Genomic Data Analysis
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
Dynamic programming and pairwise sequence alignment
Dynamic programming and pairwise sequence alignmentDynamic programming and pairwise sequence alignment
Dynamic programming and pairwise sequence alignment
 

Similar to B.sc biochem i bobi u 3.2 algorithm + blast

lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
alizain9604
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
Abhishek Vatsa
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
Rai University
 
blast presentation beevragh muneer.pptx
blast presentation  beevragh muneer.pptxblast presentation  beevragh muneer.pptx
blast presentation beevragh muneer.pptx
home
 

Similar to B.sc biochem i bobi u 3.2 algorithm + blast (20)

sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge
 
2015 bioinformatics alignments_wim_vancriekinge
2015 bioinformatics alignments_wim_vancriekinge2015 bioinformatics alignments_wim_vancriekinge
2015 bioinformatics alignments_wim_vancriekinge
 
Sequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfSequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdf
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. local
 
Bioinformatica t4-alignments
Bioinformatica t4-alignmentsBioinformatica t4-alignments
Bioinformatica t4-alignments
 
Laboratory 1 sequence_alignments
Laboratory 1 sequence_alignmentsLaboratory 1 sequence_alignments
Laboratory 1 sequence_alignments
 
Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013
 
Phylogenetic analysis in nutshell
Phylogenetic analysis in nutshellPhylogenetic analysis in nutshell
Phylogenetic analysis in nutshell
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
 
Bioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisBioinformatics_Sequence Analysis
Bioinformatics_Sequence Analysis
 
blast presentation beevragh muneer.pptx
blast presentation  beevragh muneer.pptxblast presentation  beevragh muneer.pptx
blast presentation beevragh muneer.pptx
 

More from Rai University

Bsc agri 2 pae u-4.4 publicrevenue-presentation-130208082149-phpapp02
Bsc agri  2 pae  u-4.4 publicrevenue-presentation-130208082149-phpapp02Bsc agri  2 pae  u-4.4 publicrevenue-presentation-130208082149-phpapp02
Bsc agri 2 pae u-4.4 publicrevenue-presentation-130208082149-phpapp02
Rai University
 

More from Rai University (20)

Brochure Rai University
Brochure Rai University Brochure Rai University
Brochure Rai University
 
Mm unit 4point2
Mm unit 4point2Mm unit 4point2
Mm unit 4point2
 
Mm unit 4point1
Mm unit 4point1Mm unit 4point1
Mm unit 4point1
 
Mm unit 4point3
Mm unit 4point3Mm unit 4point3
Mm unit 4point3
 
Mm unit 3point2
Mm unit 3point2Mm unit 3point2
Mm unit 3point2
 
Mm unit 3point1
Mm unit 3point1Mm unit 3point1
Mm unit 3point1
 
Mm unit 2point2
Mm unit 2point2Mm unit 2point2
Mm unit 2point2
 
Mm unit 2 point 1
Mm unit 2 point 1Mm unit 2 point 1
Mm unit 2 point 1
 
Mm unit 1point3
Mm unit 1point3Mm unit 1point3
Mm unit 1point3
 
Mm unit 1point2
Mm unit 1point2Mm unit 1point2
Mm unit 1point2
 
Mm unit 1point1
Mm unit 1point1Mm unit 1point1
Mm unit 1point1
 
Bdft ii, tmt, unit-iii, dyeing & types of dyeing,
Bdft ii, tmt, unit-iii,  dyeing & types of dyeing,Bdft ii, tmt, unit-iii,  dyeing & types of dyeing,
Bdft ii, tmt, unit-iii, dyeing & types of dyeing,
 
Bsc agri 2 pae u-4.4 publicrevenue-presentation-130208082149-phpapp02
Bsc agri  2 pae  u-4.4 publicrevenue-presentation-130208082149-phpapp02Bsc agri  2 pae  u-4.4 publicrevenue-presentation-130208082149-phpapp02
Bsc agri 2 pae u-4.4 publicrevenue-presentation-130208082149-phpapp02
 
Bsc agri 2 pae u-4.3 public expenditure
Bsc agri  2 pae  u-4.3 public expenditureBsc agri  2 pae  u-4.3 public expenditure
Bsc agri 2 pae u-4.3 public expenditure
 
Bsc agri 2 pae u-4.2 public finance
Bsc agri  2 pae  u-4.2 public financeBsc agri  2 pae  u-4.2 public finance
Bsc agri 2 pae u-4.2 public finance
 
Bsc agri 2 pae u-4.1 introduction
Bsc agri  2 pae  u-4.1 introductionBsc agri  2 pae  u-4.1 introduction
Bsc agri 2 pae u-4.1 introduction
 
Bsc agri 2 pae u-3.3 inflation
Bsc agri  2 pae  u-3.3  inflationBsc agri  2 pae  u-3.3  inflation
Bsc agri 2 pae u-3.3 inflation
 
Bsc agri 2 pae u-3.2 introduction to macro economics
Bsc agri  2 pae  u-3.2 introduction to macro economicsBsc agri  2 pae  u-3.2 introduction to macro economics
Bsc agri 2 pae u-3.2 introduction to macro economics
 
Bsc agri 2 pae u-3.1 marketstructure
Bsc agri  2 pae  u-3.1 marketstructureBsc agri  2 pae  u-3.1 marketstructure
Bsc agri 2 pae u-3.1 marketstructure
 
Bsc agri 2 pae u-3 perfect-competition
Bsc agri  2 pae  u-3 perfect-competitionBsc agri  2 pae  u-3 perfect-competition
Bsc agri 2 pae u-3 perfect-competition
 

B.sc biochem i bobi u 3.2 algorithm + blast

  • 1. Needleman–Wunsch algorithm Course: B.Sc Biochemistry Subject: Basic of Bioinformatics Unit: III
  • 2. • The Needleman–Wunsch algorithm is an algorithm used in bioinformatics to align protein or nucleotide sequences. • It was one of the first applications of dynamic programming to compare biological sequences. • The algorithm was developed by Saul B. Needleman and Christian D. Wunsch and published in 1970. • The algorithm essentially divides a large problem (e.g. the full sequence) into a series of smaller problems and uses the solutions to the smaller problems to reconstruct a solution to the larger problem. • It is also sometimes referred to as the optimal matching algorithm and the global alignment technique. • The Needleman–Wunsch algorithm is still widely used for optimal global alignment, particularly when the quality of the global alignment is of the upmost importance.
  • 3. Choose Scoring System • Next we need to decide how to score how each individual pair of letters match up. Just by looking at our two strings you may be able to see one possible best alignment: GCATG-CU G-ATTACA  We can see that letters may match, mismatch, be deleted or inserted (indel): • Match: The two letters are the same • Mismatch: The two letters are differential • Indel (INsertion or DELetion) : One letter aligns to a gap in the other string.
  • 4. • There are various ways to score these three scenarios. • These have been outlined in the Scoring Systems section below. • For now we will use the simple system used by Needleman and Wunsch; matches are given +1, mismatches are given -1 and Indels(Gap) are given -1. • The original purpose of the algorithm described by Needleman and Wunsch was to find similarities in the amino acid sequences of two proteins.
  • 5. Figure 1: Needleman-Wunsch pairwise sequence alignment Sequences Best Alignments --------- ---------------------- GATTACA G-ATTACA G-ATTACA G-ATTACA GCATGCU GCATG-CU GCA-TGCU GCAT-GCU 1.
  • 6. • Needleman and Wunsch describe their algorithm explicitly for the case when the alignment is penalized solely by the matches and mismatches, and gaps have penalty. • The Needleman–Wunsch algorithm is still widely used for optimal global alignment, particularly when the quality of the global alignment is of the upmost importance. However, the algorithm is expensive with respect to time and space, proportional to the product of the length of two sequences and hence is not suitable for long sequences. • Recent development has focused on improving the time and space cost of the algorithm while maintaining quality. For example, in 2013, a Fast Optimal Global Sequence Alignment Algorithm (FOGSAA).
  • 7. Smith–Waterman algorithm • The Smith–Waterman algorithm performs local sequence alignment; that is, for determining similar regions between two strings or nucleotide or protein sequences. Instead of looking at the total sequence. • The Smith–Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure. • The algorithm was first proposed by Temple F. Smith and Michael S. Waterman in 1980. • Like the Needleman–Wunsch algorithm, of which it is a variation, Smith– Waterman is a dynamic programming algorithm. • As such, it has the desirable property that it is guaranteed to find the optimal local alignment with respect to the scoring system being used (which includes the substitution matrix and the gap-scoring scheme). • The main difference to the Needleman–Wunsch algorithm is that negative scoring matrix cells are set to zero, which renders the (thus positively scoring) local alignments visible. • Backtracking starts at the highest scoring matrix cell and proceeds until a cell with score zero is encountered, yielding the highest scoring local alignment. • One does not actually implement the algorithm as described because improved alternatives are now available that have better scaling and are more accurate.
  • 8. 2.
  • 9. • To obtain the optimum local alignment, start with the highest value in the matrix (i,j). Then, go backwards to one of positions (i − 1,j), (i, j − 1), and (i − 1, j − 1) depending on the direction of movement used to construct the matrix. This methodology is maintained until a matrix cell with zero value is reached. • In the example, the highest value corresponds to the cell in position (8,8). The walk back corresponds to (8,8), (7,7), (7,6), (6,5), (5,4), (4,3), (3,2), (2,1), (1,1), and (0,0), • Once finished, the alignment is reconstructed as follows: Starting with the last value, reach (i,j) using the previously calculated path. • A diagonal jump implies there is an alignment (either a match or a mismatch). • A top-down jump implies there is a deletion. • A left-right jump implies there is an insertion. • For the example, the results are: Sequence 1 = A-CACACTA Sequence 2 = AGCACAC-A
  • 10. • One motivation for local alignment is the difficulty of obtaining correct alignments in regions of low similarity between distantly related biological sequences, because mutations have added too much 'noise' over evolutionary time to allow for a meaningful comparison of those regions. Local alignment avoids such regions altogether and focuses on those with a positive score, i.e. those with an evolutionary conserved signal of similarity. • Another motivation for using local alignments is that there is a reliable statistical model (developed by Karlin and Altschul) for optimal local alignments. The alignment of unrelated sequences tends to produce optimal local alignment scores which follow an extreme value distribution. This property allows programs to produce an expectation value for the optimal local alignment of two sequences, which is a measure of how often two unrelated sequences would produce an optimal local alignment whose score is greater than or equal to the observed score. Very low expectation values indicate that the two sequences in question might be homologous, meaning they might share a common ancestor.
  • 11. • The Smith–Waterman algorithm is fairly demanding of time: To align two sequences of lengths m and n, O(mn) time is required. Smith–Waterman local similarity scores can be calculated in O(m) (linear) space if only the optimal alignment needs to be found, but naive algorithms to produce the alignment require O(mn) space. A linear space strategy to find the best local alignment has been described. BLAST and FASTA reduce the amount of time required by identifying conserved regions using rapid lookup strategies, at the cost of exactness.
  • 12. BLAST • In bioinformatics, BLAST for Basic Local Alignment Search Tool is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. A BLAST search enables a researcher to compare a query sequence with a library or database of sequences, and identify library sequences that resemble the query sequence above a certain threshold. • Different types of BLASTs are available according to the query sequences. For example, following the discovery of a previously unknown gene in the mouse, a scientist will typically perform a BLAST search of the human genome to see if humans carry a similar gene; BLAST will identify sequences in the human genome that resemble the mouse gene based on similarity of sequence. • BLAST is more time-efficient than FASTA by searching only for the more significant patterns in the sequences, yet with comparative sensitivity. This could be further realized by understanding the algorithm of BLAST introduced below.
  • 13. Input • Input sequences are in FASTA or Genbank format and weight matrix. Output • BLAST output can be delivered in a variety of formats. These formats include HTML, plain text, and XML formatting. For NCBI's web-page, the default format for output is HTML. When performing a BLAST on NCBI, the results are given in a graphical format showing the hits found, a table showing sequence identifiers for the hits with scoring related data, as well as alignments for the sequence of interest and the hits received with corresponding BLAST scores for these. The easiest to read and most informative of these is probably the table.
  • 14. • There are now a handful of different BLAST programs available, which can be used depending on what one is attempting to do and what they are working with. These different programs vary in query sequence input, the database being searched, and what is being compared. These programs and their details are listed below: • BLAST is actually a family of programs (all included in the blastall executable). These include: 1. Nucleotide-nucleotide BLAST (blastn) • This program, given a DNA query, returns the most similar DNA sequences from the DNA database that the user specifies. 2. Protein-protein BLAST (blastp) • This program, given a protein query, returns the most similar protein sequences from the protein database that the user specifies. 3. Nucleotide 6-frame translation-protein (blastx) • This program compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database.
  • 15. 4. Nucleotide 6-frame translation-nucleotide 6-frame translation (tblastx) • This program is the slowest of the BLAST family. It translates the query nucleotide sequence in all six possible frames and compares it against the six-frame translations of a nucleotide sequence database. The purpose of tblastx is to find very distant relationships between nucleotide sequences. 5. Protein-nucleotide 6-frame translation (tblastn) • This program compares a protein query against the all six reading frames of a nucleotide sequence database.  Of these programs, BLASTn and BLASTp are the most commonly used because they use direct comparisons, and do not require translations. However, since protein sequences are better conserved evolutionarily than nucleotide sequences, tBLASTn, tBLASTx, and BLASTx, produce more reliable and accurate results when dealing with coding DNA. They also enable one to be able to directly see the function of the protein sequence, since by translating the sequence of interest before searching often gives you annotated protein hits.
  • 16. Uses of BLAST • BLAST can be used for several purposes. These include identifying species, locating domains, establishing phylogeny, DNA mapping, and comparison. 1. Identifying species • With the use of BLAST, you can possibly correctly identify a species or find homologous species. This can be useful, for example, when you are working with a DNA sequence from an unknown species. 2. Locating domains • When working with a protein sequence you can input it into BLAST, to locate known domains within the sequence of interest. 3. Establishing phylogeny • Using the results received through BLAST you can create a phylogenetic tree using the BLAST web-page. Phylogenies based on BLAST alone are less reliable than other purpose-built computational phylogenetic methods, so should only be relied upon for "first pass" phylogenetic analyses.
  • 17. 4. DNA mapping • When working with a known species, and looking to sequence a gene at an unknown location, BLAST can compare the chromosomal position of the sequence of interest, to relevant sequences in the database(s). 5. Comparison • When working with genes, BLAST can locate common genes in two related species, and can be used to map annotations from one organism to another.
  • 18. Books and Web References • Books Name : 1. Introduction To Bioinformatics by T. K. Attwood 2. BioInformatics by Sangita 3. Basic Bioinformatics by S.Ignacimuthu, s.j. http://en.wikipedia.org/wiki/BLAST http://en.wikipedia.org/wiki/Algorithm Image References 1.http://upload.wikimedia.org/wikipedia/commons/3/3f/Needleman- Wunsch_pairwise_sequence_alignment.png 2.http://upload.wikimedia.org/math/b/9/9/b99e56fa40bb943637c6891a46b cfd26.png