FASTA

•

10 likes•2,745 views

Thapar Institute of Engineering & Technology, Patiala, Punjab, India

In bioinformatics and biochemistry, the FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences.

Education

FASTA
Amandeep Singh
Assistant Professor
Department of Biotechnology
GSSDGS Khalsa College Patiala

Introduction
FASTA uses an algorithm for similarity search for nucleotide or protein
sequence from a biological database.
Nucleotide Sequence (Query)
Protein Sequence (Query)
Nucleotide Sequence (Database)
Protein Sequence (Database)

FASTA Algorithm
It start from a Dot-plot or Dot-matrix.
A B C D E F
A
B
M
D
L
F
Second Sequence (Database)
First Sequence
(Query)
Shows regions of similarity
between 2 Sequences
represented as diagonals.

FASTA Algorithm
• FASTA goes a step forward from dot-plot
• It calculates the sum of dots along each diagonal.
• It is a “word” based method.
• It looks for matching “word” or the sequence of patterns called “k-tuple”
Tuple: Finite ordered list of elements
Sequence patterns: 1 or 2 amino acids, or 5 or 6 nucleotides
• Build local alignment using this “word” or “k-tuple”.
• Match identical “word”
• Create diagonals by joining adjacent matches.
• Rescore the highest scoring system using PAM or BLOSUM matrix.
• Best of these scores is called init1.
• Join segments using gaps, the best score from this is called initn.
• Use Dynamic programing (Smith-Waterman algorithm) to create the optimal alignment.

FASTA Implementation
FASTA3 (https://www.ebi.ac.uk/Tools/sss/fasta/) at the EBI is one of
the most popular FASTA implementations.

FASTA Output
• The Histogram
• The Sequence listing
• The Local alignments

FASTA Output
The Histogram
• First part of FASTA output is Histogram.
• Predicted extreme value is represented by asterisk * symbol
• Actual numbers obtained is represented by equal = sign
• First column: z-opt score
• Second column: number of sequences with these z-opt scores
• Third column: Expected number of alignments
Histogram used to determine, whether statistical theory is valid or not.
• If equal sign follow predicted value  Valid
• If equal sign do not follow predicted value  Invalid

FASTA Output: The Sequence listing
• Listing of the best scoring sequences in the database.
• Best sequence: reported first
• Worst sequence: reported last
First Column Second
Column
Opt
column
Last
Column
Database Database
accession
number
Database
identifier
Total length
of database
sequence
Final score E-Value

FASTA Output: The Local alignments
Display:
 The local alignment
 Init1 & Initn scores
 E-value
 Opt-score
 Z-score
 Percent identity

Significance of E-Value
• E-Value or Expected value is about number of
alignments hit by chance.
• Smaller the E-value: Less likely a given alignment
occurred by chance.

Variants of FASTA
• FastA - Compares a DNA query sequence to a DNA database, or a
protein query to a protein database, detecting the sequence type
automatically.
• FASTX - Compares a DNA query to a protein database. It may
introduce gaps only between codons.
• FASTY - Compares a DNA query to a protein database, optimizing
gap location, even within codons.
• TFASTA - Compares a protein query to a DNA database.

What's hot

Data Retrieval SystemsSaramita De Chakravarti

MULTIPLE SEQUENCE ALIGNMENTMariya Raju

Dot matrixTania Khan

Structural databases Priyadharshana

Blast and fastaALLIENU

Entrez databasesHafiz Muhammad Zeeshan Raza

Scop databaseSayantani Roy

Secondary protein structure predictionSiva Dharshini R

Clustal W - Multiple Sequence alignment The Oxford College Engineering

Genome annotation 2013Karan Veer Singh

methods for protein structure predictionkaramveer prajapat

Uni prot presentationRida Khalid

Sequence alig Sequence Alignment Pairwise alignment:-naveed ul mushtaq

PrositeRashi Srivastava

Swiss pdb viewerSivasangari Shanmugam

NCBI National Center for Biotechnology InformationThapar Institute of Engineering & Technology, Patiala, Punjab, India

Sequence file formatsAlphonsa Joseph

sequence of file formats in bioinformaticsnadeem akhter

Secondary Structure Prediction of proteins Vijay Hemmadi

Proteins databasesHafiz Muhammad Zeeshan Raza

What's hot (20)

Data Retrieval Systems

MULTIPLE SEQUENCE ALIGNMENT

Dot matrix

Structural databases

Blast and fasta

Entrez databases

Scop database

Secondary protein structure prediction

Clustal W - Multiple Sequence alignment

Genome annotation 2013

methods for protein structure prediction

Uni prot presentation

Sequence alig Sequence Alignment Pairwise alignment:-

Prosite

Swiss pdb viewer

NCBI National Center for Biotechnology Information

Sequence file formats

sequence of file formats in bioinformatics

Secondary Structure Prediction of proteins

Proteins databases

Similar to FASTA

Blast fastayaghava

BLAST AND FASTA.pptxPiyushBehgal1

Blast bioinformaticsatmapandey

Sequence comparison techniquesruchibioinfo

BLAST AND FASTA.pptx12345789999987544321234alizain9604

MayankMayank Miky

Sequence similarity tools.pptxPagudalaSangeetha

Database SearchingMeghaj Mallick

Blast 2013 1Jumbo Nantawong

FastA HOMOLOGY SEARCH ALGORITHMMuunda Mudenda

Sequence homology search and multiple sequence alignment(1)AnkitTiwari354

Sequence databaseDr.M.Prasad Naidu

BlastIndira Kandasamy

BLAST (Basic local alignment search Tool)Ariful Islam Sagar

BIOINFORMATICS_AND_PHYLOGENY.pdf.pdfsirwansleman

Sequence-analysis-pairwise-alignment.pdfsriaisvariyasundar

lecture4.ppt Sequence Alignmentaldf sdfsadfalizain9604

Presentation for blast algorithm bio-informaticezahid6

Blast AlgorithmDaffodil international University

Sequence alignmentDr. Harisingh Gour Vishwavidyalaya (A Central Universuty), Sagar, MP

Similar to FASTA (20)

Blast fasta

BLAST AND FASTA.pptx

Blast bioinformatics

Sequence comparison techniques

BLAST AND FASTA.pptx12345789999987544321234

Mayank

Sequence similarity tools.pptx

Database Searching

Blast 2013 1

FastA HOMOLOGY SEARCH ALGORITHM

Sequence homology search and multiple sequence alignment(1)

Sequence database

Blast

BLAST (Basic local alignment search Tool)

BIOINFORMATICS_AND_PHYLOGENY.pdf.pdf

Sequence-analysis-pairwise-alignment.pdf

lecture4.ppt Sequence Alignmentaldf sdfsadf

Presentation for blast algorithm bio-informatice

Blast Algorithm

Sequence alignment

More from Thapar Institute of Engineering & Technology, Patiala, Punjab, India

SDS PAGEThapar Institute of Engineering & Technology, Patiala, Punjab, India

Agarose gel electrophoresisThapar Institute of Engineering & Technology, Patiala, Punjab, India

Prokaryotic and eukaryotic cellThapar Institute of Engineering & Technology, Patiala, Punjab, India

Preparation and staining of specimens for microscopyThapar Institute of Engineering & Technology, Patiala, Punjab, India

Microbial polysaccharidesThapar Institute of Engineering & Technology, Patiala, Punjab, India

Organic acids production copyThapar Institute of Engineering & Technology, Patiala, Punjab, India

Methods of strain improvementThapar Institute of Engineering & Technology, Patiala, Punjab, India

RefrigerationThapar Institute of Engineering & Technology, Patiala, Punjab, India

PatentsThapar Institute of Engineering & Technology, Patiala, Punjab, India

VaccinesThapar Institute of Engineering & Technology, Patiala, Punjab, India

Chemical reactions and rancidity of fatsThapar Institute of Engineering & Technology, Patiala, Punjab, India

Characteristics of biological databasesThapar Institute of Engineering & Technology, Patiala, Punjab, India

Organoleptic properties of proteinsThapar Institute of Engineering & Technology, Patiala, Punjab, India

Denaturation of proteinsThapar Institute of Engineering & Technology, Patiala, Punjab, India

OMIM- Online Mendelian Inheritance in Man Thapar Institute of Engineering & Technology, Patiala, Punjab, India

Antigen & antigenicityThapar Institute of Engineering & Technology, Patiala, Punjab, India

Protein Data Bank (PDB)Thapar Institute of Engineering & Technology, Patiala, Punjab, India

SWISS-PROTThapar Institute of Engineering & Technology, Patiala, Punjab, India

PIR- Protein Information ResourceThapar Institute of Engineering & Technology, Patiala, Punjab, India

Organs of the immune systemThapar Institute of Engineering & Technology, Patiala, Punjab, India

More from Thapar Institute of Engineering & Technology, Patiala, Punjab, India (20)

SDS PAGE

Agarose gel electrophoresis

Prokaryotic and eukaryotic cell

Preparation and staining of specimens for microscopy

Microbial polysaccharides

Organic acids production copy

Methods of strain improvement

Refrigeration

Patents

Vaccines

Chemical reactions and rancidity of fats

Characteristics of biological databases

Organoleptic properties of proteins

Denaturation of proteins

OMIM- Online Mendelian Inheritance in Man

Antigen & antigenicity

Protein Data Bank (PDB)

SWISS-PROT

PIR- Protein Information Resource

Organs of the immune system

Recently uploaded

TataKelola dan KamSiber Kecerdasan Buatan v022.pdfSarwono Sutikno, Dr.Eng.,CISA,CISSP,CISM,CSX-F

call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR

Rapple "Scholarly Communications and the Sustainable Development Goals"National Information Standards Organization (NISO)

Influencing policy (training slides from Fast Track Impact)Mark Reed

How to do quick user assign in kanban in Odoo 17 ERPCeline George

Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN

Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝9953056974 Low Rate Call Girls In Saket, Delhi NCR

Proudly South Africa powerpoint Thorisha.pptxthorishapillay1

Romantic Opera MUSIC FOR GRADE NINE pptxsqpmdrvczh

Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1

HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection

Hierarchy of management that covers different levels of managementmkooblal

Field Attribute Index Feature in Odoo 17Celine George

Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc

MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma

Computed Fields and api Depends in the Odoo 17Celine George

AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb

Types of Journalistic Writing Grade 8.pptxEyham Joco

How to Configure Email Server in Odoo 17Celine George

Quarter 4 Peace-education.pptx Catch Up FridayMakMakNepo

Recently uploaded (20)

TataKelola dan KamSiber Kecerdasan Buatan v022.pdf

call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️

Rapple "Scholarly Communications and the Sustainable Development Goals"

Influencing policy (training slides from Fast Track Impact)

How to do quick user assign in kanban in Odoo 17 ERP

Solving Puzzles Benefits Everyone (English).pptx

Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝

Proudly South Africa powerpoint Thorisha.pptx

Romantic Opera MUSIC FOR GRADE NINE pptx

Employee wellbeing at the workplace.pptx

HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...

Hierarchy of management that covers different levels of management

Field Attribute Index Feature in Odoo 17

Procuring digital preservation CAN be quick and painless with our new dynamic...

MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx

Computed Fields and api Depends in the Odoo 17

AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf

Types of Journalistic Writing Grade 8.pptx

How to Configure Email Server in Odoo 17

Quarter 4 Peace-education.pptx Catch Up Friday

FASTA

1. FASTA Amandeep Singh Assistant Professor Department of Biotechnology GSSDGS Khalsa College Patiala

2. Introduction FASTA uses an algorithm for similarity search for nucleotide or protein sequence from a biological database. Nucleotide Sequence (Query) Protein Sequence (Query) Nucleotide Sequence (Database) Protein Sequence (Database)

3. FASTA Algorithm It start from a Dot-plot or Dot-matrix. A B C D E F A B M D L F Second Sequence (Database) First Sequence (Query) Shows regions of similarity between 2 Sequences represented as diagonals.

4. FASTA Algorithm • FASTA goes a step forward from dot-plot • It calculates the sum of dots along each diagonal. • It is a “word” based method. • It looks for matching “word” or the sequence of patterns called “k-tuple” Tuple: Finite ordered list of elements Sequence patterns: 1 or 2 amino acids, or 5 or 6 nucleotides • Build local alignment using this “word” or “k-tuple”. • Match identical “word” • Create diagonals by joining adjacent matches. • Rescore the highest scoring system using PAM or BLOSUM matrix. • Best of these scores is called init1. • Join segments using gaps, the best score from this is called initn. • Use Dynamic programing (Smith-Waterman algorithm) to create the optimal alignment.

5. FASTA Algorithm

6. FASTA Implementation FASTA3 (https://www.ebi.ac.uk/Tools/sss/fasta/) at the EBI is one of the most popular FASTA implementations.

7. FASTA Output • The Histogram • The Sequence listing • The Local alignments

8. FASTA Output The Histogram • First part of FASTA output is Histogram. • Predicted extreme value is represented by asterisk * symbol • Actual numbers obtained is represented by equal = sign • First column: z-opt score • Second column: number of sequences with these z-opt scores • Third column: Expected number of alignments Histogram used to determine, whether statistical theory is valid or not. • If equal sign follow predicted value  Valid • If equal sign do not follow predicted value  Invalid

9. FASTA Output: The Histogram

10. FASTA Output: The Sequence listing • Listing of the best scoring sequences in the database. • Best sequence: reported first • Worst sequence: reported last First Column Second Column Opt column Last Column Database Database accession number Database identifier Total length of database sequence Final score E-Value

11. FASTA Output: The Sequence listing

12. FASTA Output: The Local alignments Display:  The local alignment  Init1 & Initn scores  E-value  Opt-score  Z-score  Percent identity

13. Significance of E-Value • E-Value or Expected value is about number of alignments hit by chance. • Smaller the E-value: Less likely a given alignment occurred by chance.

14. Variants of FASTA • FastA - Compares a DNA query sequence to a DNA database, or a protein query to a protein database, detecting the sequence type automatically. • FASTX - Compares a DNA query to a protein database. It may introduce gaps only between codons. • FASTY - Compares a DNA query to a protein database, optimizing gap location, even within codons. • TFASTA - Compares a protein query to a DNA database.

FASTA

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to FASTA

Similar to FASTA (20)

More from Thapar Institute of Engineering & Technology, Patiala, Punjab, India

More from Thapar Institute of Engineering & Technology, Patiala, Punjab, India (20)

Recently uploaded

Recently uploaded (20)

FASTA