Heuristic
Alignment
FASTA
A program for rapid alignment of pairs of protein and
DNA sequences
Searches for matching sequence patterns or words, called
k-tuples
The program then builds a local alignment based on
these word matches
Lookup method for finding an alignment
Location of 10 best matching
regions by rapid screen
i. Finding k consecutive
matches
ii. Joining of matches within a
certain distance
iii. Identification of regions
with highest density
Evaluation of highest-density
regions by scoring matrices
Identification highest scoring
regions (INIT1)
Generation of Longer regions of
identity of score INITN by
joining initial regions
When INITN reach certain
threshold the score is
recalculated to give OPT score
(by Smith-Waterman algorithm)
By improving the score
sensitivity increases
selectivity decreases
INITN and OPT scores are used to rank database matches
Versions of FASTA,
FASTA – Prot on Prot / DNA on DNA
TFASTA – Prot on DNA library
FASTF/FASTS – Peptide on Prot
TFASTF/TFASTS – Peptide on DNA
FASTX/FASTY – DNA on Prot
Sensitivity:
The ability of a search method to locate as many members
of a protein family as possible, including distant members of
limited sequence similarity
Sensitivity:
The ability of a search method to locate as many members
of a protein family as possible, including distant members of
limited sequence similarity

Fasta

  • 1.
  • 2.
    FASTA A program forrapid alignment of pairs of protein and DNA sequences Searches for matching sequence patterns or words, called k-tuples The program then builds a local alignment based on these word matches
  • 4.
    Lookup method forfinding an alignment
  • 5.
    Location of 10best matching regions by rapid screen i. Finding k consecutive matches ii. Joining of matches within a certain distance iii. Identification of regions with highest density
  • 6.
    Evaluation of highest-density regionsby scoring matrices Identification highest scoring regions (INIT1)
  • 7.
    Generation of Longerregions of identity of score INITN by joining initial regions When INITN reach certain threshold the score is recalculated to give OPT score (by Smith-Waterman algorithm) By improving the score sensitivity increases selectivity decreases
  • 8.
    INITN and OPTscores are used to rank database matches Versions of FASTA, FASTA – Prot on Prot / DNA on DNA TFASTA – Prot on DNA library FASTF/FASTS – Peptide on Prot TFASTF/TFASTS – Peptide on DNA FASTX/FASTY – DNA on Prot
  • 9.
    Sensitivity: The ability ofa search method to locate as many members of a protein family as possible, including distant members of limited sequence similarity
  • 10.
    Sensitivity: The ability ofa search method to locate as many members of a protein family as possible, including distant members of limited sequence similarity