 Used to find the local similarity or alignment
shared by two sequences.
 Method to find the similarity is called the
alignment. It can be of two types,
 Global alignment – align the entire sequence
using as many characters as possible.
 Local alignment – focuses on region of
similarity in parts of the sequence only
 Alignment of two sequences is performed by
following methods:
 Dot matrix analysis
 Dynamic programming
 Word or k-tuple method (FASTA & BLAST
programs)
 Align two sequences very quickly, first by
searching for identical short stretches of
sequences called word or k –tuple.
 Then by joining these words into an alignment by
dynamic programming method.
 BLAST and FASTA methods are heuristic.
 Basic local alignment search tool (BLAST) is a
popular user friendly tool for searching all the
major sequence databases.
 It is used to find sequence homolog to predict the
identity, function, 3D structure of the query
sequence.
 It shows better results for protein sequences than
nucleotide sequences.
 Local alignment: BLAST tries to find patches of
regional similarity, rather than trying for global
fit between the query and the database sequence.
 BLAST works under the assumption that high-
scoring alignments are likely to contain short
stretches of identical or near identical letters,
called words.
 BLAST is extremely fast, the program can be run
locally or queries can be e-mailed to NCBI
server.
 It does not guarantee to find the best alignment
between query and database, it may miss
matches.
 Its because its strategy is expected to find most
matches, & this way it sacrifices complete
sensitivity thus to gain speed.
 BLAST searches in two phases.
 First, it looks for short subsequences that are
likely to have significant matches.
 Then it tries to extend these matched regions on
both sides in order to obtain maximum sequence
similarity.
 It is a scoring method used in alignment of one
residue against other.
 Margaret dayhoff and her co-workers developed
the first substitution matrix used in comparison
of protein sequences for evolutionary terms.
 These matrices are commonly called as PAM
matrices.
 In contrast to PAM, Steve Henikoff and his
coworkers developed BLOSUM matrices.
Percentacceptedmutation
matrix(PAM)BLOSUM
 PAM matrices are based on global alignment of
closely related proteins.
 Number accompanying PAM refers to
evolutionary distanced. Larger number represent
greater evolutionary distance.
 PAM 250 is widely used.
o BLOSUM matrices are based on local alignments.
o Smaller number corresponds to greater
evolutionary distant sequences.
o BLOSUM 62 is widely used
Pre processing of the query:-
Quickly locate ungapped similarity between query sequence
and sequence from database.
All words of length ‘W’, of the query are compared with
database sequences.
Generation of hits:-
Hit is made with one or several successive pairs of similar
words, and characterised by its positon in each of two
sequences.
All the possible hits between query and database are
calculated
 Extension of the hits:-
every hit is now extended, without gaps, inorder
to determine whether this hits may be part of a larger
segment of similarity.
every extended segment pair that scores the same
or better than S (set as parameter of program) is kept
and called as HSP( high scoring segment pair).
 Standard BLAST are of five types:
 BLASTp
 BLASTn
 BLASTx
 tBLASTn
 tBLASTx
o Other class include:
 MegaBLAST
 PSI BLAST
 PHI BLAST
 BLASTp – this program compares an amino acid
query sequence against a protein sequence database.
 BLASTn – it compares a nucleotide query sequence
against a nucleotide sequence database.
 BLASTx – it searches the six frame translation
products of a nucleotide sequence against a protein
database.
 tBLASTn – it searches a protein sequence against
translated nucleotide sequence in the database.
 tBLASTx – it compares the six frame translations of
a nucleotide query sequence against six frame
translations of database.
 Mega BLAST – it is a program optimized for
aligning long sequences. It can only work with
DNA sequences.
 PSI BLAST – it stands for position specific
iterated BLAST. It is useful for protein similarity
search.
 PHI BLAST – pattern hit initiated BLAST, it can
be used to search for a specific pattern or motif
 It’s a sequence analysis tool, similar to BLAST.
 It was developed by W.R. Pearson and Lipman
and this algorithm can be accessed from EBI site.
 Fast A gives better results for nucleotide
sequences than protein.
 FastP is for protein sequences.
 finds regions of similarity by first breaking the
sequence into short subsequences, then searching
for diagonals with highest density of words that
match.
 The alignment in diagonals is then refined.
 Its fast but is not guaranteed to find the best
alignment, it may miss matches.
 First FASTA prepares a list of words from the pair of
sequences to be matched. Words can be 3-6 nucleotides
or 1or 2 amino acids.
 It uses non overlapping words, it matches the words and
makes a count of it.
 It creates the word diagonal and finds a high scoring
match. The output is labeled as unit1
 Only if score is sizable it proceeds to the second level.
 In the second level, for every best hit of words, it looks
for neighboring approximate hits
 If the score value is good, and prepares a larger dot
matrix diagonal.
 The best score from this second level scoring is
called initin,
 The initin scores are saved for each comparison
of a query sequence with database sequence.
 Different programs in FASTA include
 FASTP (protein sequence).
 TFASTA (compares a query protein sequence to a
DNA sequence database).
 FASTF( compares a set of ordered peptide fragments
obtained from analysis of protein by cleavage and
sequencing of protein bands resolved by
electrophoresis against a protein database).
 TFASTF( compares a set of ordered peptide
fragments against a DNA database).
Blast and fasta

Blast and fasta

  • 2.
     Used tofind the local similarity or alignment shared by two sequences.  Method to find the similarity is called the alignment. It can be of two types,  Global alignment – align the entire sequence using as many characters as possible.  Local alignment – focuses on region of similarity in parts of the sequence only
  • 4.
     Alignment oftwo sequences is performed by following methods:  Dot matrix analysis  Dynamic programming  Word or k-tuple method (FASTA & BLAST programs)
  • 5.
     Align twosequences very quickly, first by searching for identical short stretches of sequences called word or k –tuple.  Then by joining these words into an alignment by dynamic programming method.  BLAST and FASTA methods are heuristic.
  • 7.
     Basic localalignment search tool (BLAST) is a popular user friendly tool for searching all the major sequence databases.  It is used to find sequence homolog to predict the identity, function, 3D structure of the query sequence.  It shows better results for protein sequences than nucleotide sequences.
  • 8.
     Local alignment:BLAST tries to find patches of regional similarity, rather than trying for global fit between the query and the database sequence.  BLAST works under the assumption that high- scoring alignments are likely to contain short stretches of identical or near identical letters, called words.
  • 9.
     BLAST isextremely fast, the program can be run locally or queries can be e-mailed to NCBI server.  It does not guarantee to find the best alignment between query and database, it may miss matches.  Its because its strategy is expected to find most matches, & this way it sacrifices complete sensitivity thus to gain speed.
  • 10.
     BLAST searchesin two phases.  First, it looks for short subsequences that are likely to have significant matches.  Then it tries to extend these matched regions on both sides in order to obtain maximum sequence similarity.
  • 12.
     It isa scoring method used in alignment of one residue against other.  Margaret dayhoff and her co-workers developed the first substitution matrix used in comparison of protein sequences for evolutionary terms.  These matrices are commonly called as PAM matrices.  In contrast to PAM, Steve Henikoff and his coworkers developed BLOSUM matrices.
  • 13.
    Percentacceptedmutation matrix(PAM)BLOSUM  PAM matricesare based on global alignment of closely related proteins.  Number accompanying PAM refers to evolutionary distanced. Larger number represent greater evolutionary distance.  PAM 250 is widely used. o BLOSUM matrices are based on local alignments. o Smaller number corresponds to greater evolutionary distant sequences. o BLOSUM 62 is widely used
  • 14.
    Pre processing ofthe query:- Quickly locate ungapped similarity between query sequence and sequence from database. All words of length ‘W’, of the query are compared with database sequences. Generation of hits:- Hit is made with one or several successive pairs of similar words, and characterised by its positon in each of two sequences. All the possible hits between query and database are calculated
  • 15.
     Extension ofthe hits:- every hit is now extended, without gaps, inorder to determine whether this hits may be part of a larger segment of similarity. every extended segment pair that scores the same or better than S (set as parameter of program) is kept and called as HSP( high scoring segment pair).
  • 17.
     Standard BLASTare of five types:  BLASTp  BLASTn  BLASTx  tBLASTn  tBLASTx o Other class include:  MegaBLAST  PSI BLAST  PHI BLAST
  • 18.
     BLASTp –this program compares an amino acid query sequence against a protein sequence database.  BLASTn – it compares a nucleotide query sequence against a nucleotide sequence database.  BLASTx – it searches the six frame translation products of a nucleotide sequence against a protein database.  tBLASTn – it searches a protein sequence against translated nucleotide sequence in the database.  tBLASTx – it compares the six frame translations of a nucleotide query sequence against six frame translations of database.
  • 20.
     Mega BLAST– it is a program optimized for aligning long sequences. It can only work with DNA sequences.  PSI BLAST – it stands for position specific iterated BLAST. It is useful for protein similarity search.  PHI BLAST – pattern hit initiated BLAST, it can be used to search for a specific pattern or motif
  • 21.
     It’s asequence analysis tool, similar to BLAST.  It was developed by W.R. Pearson and Lipman and this algorithm can be accessed from EBI site.  Fast A gives better results for nucleotide sequences than protein.  FastP is for protein sequences.
  • 22.
     finds regionsof similarity by first breaking the sequence into short subsequences, then searching for diagonals with highest density of words that match.  The alignment in diagonals is then refined.  Its fast but is not guaranteed to find the best alignment, it may miss matches.
  • 23.
     First FASTAprepares a list of words from the pair of sequences to be matched. Words can be 3-6 nucleotides or 1or 2 amino acids.  It uses non overlapping words, it matches the words and makes a count of it.  It creates the word diagonal and finds a high scoring match. The output is labeled as unit1  Only if score is sizable it proceeds to the second level.  In the second level, for every best hit of words, it looks for neighboring approximate hits  If the score value is good, and prepares a larger dot matrix diagonal.
  • 24.
     The bestscore from this second level scoring is called initin,  The initin scores are saved for each comparison of a query sequence with database sequence.
  • 26.
     Different programsin FASTA include  FASTP (protein sequence).  TFASTA (compares a query protein sequence to a DNA sequence database).  FASTF( compares a set of ordered peptide fragments obtained from analysis of protein by cleavage and sequencing of protein bands resolved by electrophoresis against a protein database).  TFASTF( compares a set of ordered peptide fragments against a DNA database).