FastA HOMOLOGY SEARCH ALGORITHM

1
“Discussing FastA homology search Algorithm.”
BY: MUUNDA MUDENDA, MSc Molecular Biology and Biotechnology
Email: muundamudenda@gmail.com
FastA stands for “Fast All”. It is a sequence alignment algorithm that was developed in 1985
by Lapman and Pearson. FastA algorithm uses heuristics to find similarities between nucleotide
or protein sequences in local alignment searches. FastA programs operate on four main
parameters which are; Expect value (E-Value), true homology, threshold and putative
conserved domains.
According to Sagar Aryal (2019), the FastA algorithm uses hashing strategy to make
alignments for short stretches of identical residues. During the FastA search, queries are broken
into small sequence patterns or words known as Ktuples or Ktups which are used to search the
target sequences. A Ktup is composed of two residues for proteins and six residues for DNA
sequences. Ktups are the equivalents of words in the BLAST algorithm.
The programs in FastA include; FASTA, FASTX and FASTY, GGSEARCH, and
GLSEARCH.
 FASTA: The most common of the programs. It compare proteins sequences to protein
databases. It also has applications in nucleotides, genomes and whole genome shotgun.
 FASTX, FASTY: Used to compare nucleotide sequences to protein databases.
 SSEARCH: This program performs Smith-Waterman alignment in protein to protein or
nucleotide to nucleotide sequences.
 GGSEARCH: The program uses global alignment to compare query sequences.
 GLSEARCH: This program compares DNA or protein sequences to sequences in
databases. The searches involves global alignments in query and local alignments in
databases.
The following are the steps involved in FastA algorithm. These steps are extracted from Itshack
Pe’er.
1. Specifying an integer parameter and look for Ktup length matching substrings of the
two strings. The standard recommended Ktup values are six for DNA sequence
matching and two for protein sequence matching.

2
2. Fining the 10 best diagonal runs of hot spots in the matrix. A diagonal run is a sequence
of nearby hot spots on the same diagonal. A run need not contain all the hot spots on its
diagonal, and a diagonal may contain more than one of the 10 best runs found.
3. Evaluating the runs using an amino acid (or nucleotide) substitution matrix, and pick
the best scoring run. The single best sub-alignment found in this stage is called init1. A
filtration is performed and the diagonal runs achieving relatively low scores are
discarded.
4. Constructing a directed weighted graph whose vertices are the sub-alignments found in
the previous stage, and the weight in each vertex is the score found in the previous stage
of the sub-alignment it represents. Essentially, FASTA then finds a maximum weight
path in this graph. The best alignment found in this stage is marked initn. The low-
scoring alignments are discarded.
5. FASTA computes an alternative local alignment score, in addition to initn. The best
local alignment computed in this stage is called opt.
6. In the last step, the database sequences are ranked according to initn scores or opt scores,
and the full dynamic programming algorithm is used to align the query sequence against
each of the highest ranking result sequences.
REFERENCES
EMBL-EBI. (2021). Sequence Similarity Searching. Link: ebi.ac.uk/tools/sss/
Itshack Pe’er. (1999). FASTA. Link: cs.tau.ac.il
Pearson W. R. (2016). Finding Protein and Nucleotide Similarities with FASTA. Current
protocols in bioinformatics, 53, 3.9.1–3.9.25. https://doi.org/10.1002/0471250953.bi0309s53
Pearson, W. R., Lipman, D. J., P.N.A.S. (1988). FASTA Sequence Comparison. 85:2444-
2448 Link: gen.tcd.ie/molevol/fasta.html
Sagar Aryal. (2019). FASTA and BLAST. Microbe Notes. Link: microbenotes.com

FastA HOMOLOGY SEARCH ALGORITHM

More Related Content

What's hot

Similar to FastA HOMOLOGY SEARCH ALGORITHM

More from Muunda Mudenda

Recently uploaded

FastA HOMOLOGY SEARCH ALGORITHM