FASTA

FASTA
Amandeep Singh
Assistant Professor
Department of Biotechnology
GSSDGS Khalsa College Patiala

Introduction
FASTA uses an algorithm for similarity search for nucleotide or protein
sequence from a biological database.
Nucleotide Sequence (Query)
Protein Sequence (Query)
Nucleotide Sequence (Database)
Protein Sequence (Database)

FASTA Algorithm
It start from a Dot-plot or Dot-matrix.
A B C D E F
A
B
M
D
L
F
Second Sequence (Database)
First Sequence
(Query)
Shows regions of similarity
between 2 Sequences
represented as diagonals.

FASTA Algorithm
• FASTA goes a step forward from dot-plot
• It calculates the sum of dots along each diagonal.
• It is a “word” based method.
• It looks for matching “word” or the sequence of patterns called “k-tuple”
Tuple: Finite ordered list of elements
Sequence patterns: 1 or 2 amino acids, or 5 or 6 nucleotides
• Build local alignment using this “word” or “k-tuple”.
• Match identical “word”
• Create diagonals by joining adjacent matches.
• Rescore the highest scoring system using PAM or BLOSUM matrix.
• Best of these scores is called init1.
• Join segments using gaps, the best score from this is called initn.
• Use Dynamic programing (Smith-Waterman algorithm) to create the optimal alignment.

FASTA Implementation
FASTA3 (https://www.ebi.ac.uk/Tools/sss/fasta/) at the EBI is one of
the most popular FASTA implementations.

FASTA Output
• The Histogram
• The Sequence listing
• The Local alignments

FASTA Output
The Histogram
• First part of FASTA output is Histogram.
• Predicted extreme value is represented by asterisk * symbol
• Actual numbers obtained is represented by equal = sign
• First column: z-opt score
• Second column: number of sequences with these z-opt scores
• Third column: Expected number of alignments
Histogram used to determine, whether statistical theory is valid or not.
• If equal sign follow predicted value  Valid
• If equal sign do not follow predicted value  Invalid

FASTA Output: The Sequence listing
• Listing of the best scoring sequences in the database.
• Best sequence: reported first
• Worst sequence: reported last
First Column Second
Column
Opt
column
Last
Column
Database Database
accession
number
Database
identifier
Total length
of database
sequence
Final score E-Value

FASTA Output: The Sequence listing

FASTA Output: The Local alignments
Display:
 The local alignment
 Init1 & Initn scores
 E-value
 Opt-score
 Z-score
 Percent identity

Significance of E-Value
• E-Value or Expected value is about number of
alignments hit by chance.
• Smaller the E-value: Less likely a given alignment
occurred by chance.

Variants of FASTA
• FastA - Compares a DNA query sequence to a DNA database, or a
protein query to a protein database, detecting the sequence type
automatically.
• FASTX - Compares a DNA query to a protein database. It may
introduce gaps only between codons.
• FASTY - Compares a DNA query to a protein database, optimizing
gap location, even within codons.
• TFASTA - Compares a protein query to a DNA database.

FASTA

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to FASTA

Similar to FASTA (20)

More from Thapar Institute of Engineering & Technology, Patiala, Punjab, India

More from Thapar Institute of Engineering & Technology, Patiala, Punjab, India (20)

Recently uploaded

Recently uploaded (20)

FASTA