5. FASTA words or k-tuples
(1/2- prots, 4 to 6- NAs)
BLAST words or k-tuples
(3- prots, 11- NAs)
FASTA All possible words of same
length
BLAST Most significant Words
(e.g.., by BLOSUM62)
7. Step 1
Sequence filtering:
Programs that search for low-complexity regions or
for sequence repeats are used
e.g.., SEG, PSEG – Amino acid sequences
NSEG, DUST – For nucleic acids
8. Step 2
A list of words prepared
3 –for proteins
11 –for Nucleic acids
Step 3
Evaluation using the BLOSUM62
9. Step 4
Number of matches are reduced by a Cutoff score
called Neighborhood word score threshold (T)
Step 5
Above step repeated for each 3 letter code
Step 6
Remaining high-scoring words are organized into an
efficient search tree
10. Step 7
Each database sequence is scanned for an exact match
to one of all the words corresponding to the 1st
query
position, for the words to the second position & so on
Step 8
Alignment is extended in each direction
HSP (High-scoring segment pair) which has larger
score than the original word
11.
12. Step 9
Identification of HSPs
Step 10
Statistical significance of each HSP
Step 11
Finding two or more HSP regions that can be made
into a longer alignment
13. Step 12
The score of the alignment is obtained and the expect
value for that score is calculated