SlideShare a Scribd company logo
FASTA
(FAST-All)
 FASTA stands for fast-all” or “FastA”.
 It was the first database similarity search tool developed, preceding the development of
BLAST.
 FASTA is another sequence alignment tool which is used to search similarities between
sequences of DNA and proteins.
 FASTA uses a “hashing” strategy to find matches for a short stretch of identical residues
with a length of k. The string of residues is known as ktuples or ktups, which are
equivalent to words in BLAST, but are normally shorter than the words.
 Typically, a ktup is composed of two residues for protein sequences and six residues for
DNA sequences.
 The query sequence is thus broken down into sequence patterns or words known as k-
tuples and the target sequences are searched for these k-tuples in order to find the
similarities between the two.
 FASTA is a fine tool for similarity searches.
 These methods are not guaranteed to find the optimal alignment or true homologs, but are
50–100 times faster than dynamic programming.
 FastA - Compares a DNA query sequence to a DNA
database, or a protein query to a protein database,
detecting the sequence type automatically.
Versions 2 and 3 are in common use, version 3
having a highly improved score normalization
method. It significantly reduces the overlap between
the score distributions.
 FASTX - Compares a DNA query to a protein
database. It may introduce gaps only between
codons.
 FASTY - Compares a DNA query to a protein
database, optimizing gap location, even within
codons.
 TFASTA - Compares a protein query to a DNA
database.
• It is used for the identification of the species.
• Used for the establishment of the phylogeny
• For DNA mapping
• FASTA is also used for understanding the
biochemical functions of the protein.
• Study the evolution of the species, from where
that specific species evolved, or identify the
ancestors.
• Calculation of the molecular weight
• Identification of mutations in the sequences by
comparing those sequences with the reference
sequences.
 Basic steps Step1: Set a word size, usually 6 for DNA and 2 for protein. Hashing: FASTA
locates regions of the query sequence and matching regions in the database sequences
that have high densities of exact word matches (without gaps). The length of the
matched word is called the k-tuple parameter.
 Step 2: Scoring: The ten highest scoring regions are rescored using the BLOSUM50
scoring matrix. The score for such a pair of regions is saved as the init1 score.
 Step 3: Introduction of Gaps: FASTA determines if any of the initial regions from
different diagonals may be joined together to form an approximate alignment with gaps.
Only non-overlapping regions may be joined. The score for the joined regions is the
sum of the scores of the initial regions minus a joining penalty for each gap. The score
of the highest scoring region, at the end of this step, is saved as the init n. FASTA
 (4) Step 4: Alignment: After computing the initial scores, FASTA determines the best
segment of similarity between the query sequence and the search set sequence, using a
variation of the SmithWaterman algorithm. The score for this alignment is the opt score.
 Step 5: Random Sequence Simulation: In order to evaluate the significance of such
alignment FASTA empirically estimates the score distribution from the alignment of
many random pairs of sequences. More precisely, the characters of the query sequences
are reshuffled (to maintain bias due to length and character composition) and searched
against a random subset of the database. This empirical distribution is extrapolated,
assuming it is an extreme value distribution, and each alignment to the real query is
assigned a Z-score and an E-score. Modifications: In step4, use a band around init1
 FASTA calculates significance “on the fly”.
This can be problematic if the dataset is
small. To identify an unknown protein
sequence use either of these: FastA3,
Ssearch3 or tFastX3. FASTA3 has improved
methods of aligning sequences and of
calculating the statistical significance of
alignment.
 There is no standard filename extension for a
text file containing FASTA formatted
sequences. The table below shows each
extension and its respective meaning.
 Developed by Steven Altschul and Samuel
Karlin in 1990.
• Compares nucleotide/aminoacid
sequences
• Is a heuristic method.
• Is a fast but approximate method of
alignment.
• Locates local alignments/short matches
called words
blastp: compares a protein sequence against a
protein sequence database.
blastn: compares a nucleotide sequence against a
nucleotide sequence database.
blastx: compares a six frame translation of a
nucleotide sequence against a protein database
tblastn: compares a protein sequence against a
six frame translation of a nucleotide database
tblastx: compares a six frame translation of a
nucleotide sequence against a six frame
translation of a nucleotide database
 Blast searches begin with a query sequence
that will be matched against sequence
databases specified by the user.
•Begins by breaking down the query sequence
into a series of short overlapping “words”
•Default word size for BLAST N is 28 nucleotides
•Default word size for BLAST P is 3 amino acids
•Results obtained depend on the scoring matrix
used.
•BLOSUM 62 matrix is the default scoring matrix
for BLASTP
 Basic steps Step1: Set a word size, usually 11 for DNA and
3 for protein. Given query sequence, compile the list of
possible words, which form with words in high scoring
word pairs (Filter out low complexity regions)
 Step 2: Scan database for exact matching with the list of
words complied in step 1. e.g. qlnfsagw -> (ql, ln, nf, fs,
sa, ag, gw) Extend the list (using some threshold T) Step 3:
Scan through the string and whenever a word in the list is
found try to extend it in both directions (no gaps) to get to
a score beyond a threshold S. While extending use a
parameter L that defines how long an extension will be
tried to raise the score over S.
 Modification of step 3: -Original BLAST: Extension is
continued as long as the score continued to increase. -
Another version -BLAST2 (gapped BLAST): - Lower value of
T is used. - After extension try to combine (allowing gaps)
- Find maximal scoring segment. This program uses the
BLASTP or BLASTN algorithms for aligning two sequences.
 BLAST calculates probabilities and this can fail if
some assumptions are invalid for that search. There
are versions of BLAST for searching nucleic acid and
protein databases, which can be used to translate
DNA sequences prior to comparing them to protein
sequence databases in 1997. Recent improvement in
BLAST is GAPPED-BLAST (three times faster than the
original BLAST) and PSI-BLAST (position-specific-
iterated BLAST). The GAPPED-BLAST algorithm allows
gaps to be introduced into the alignments. That
means that similar regions are not broken into
several segments (as in the older versions). This
method reflects biological relationships much better
than ordinary BLAST.
BLAST AND FASTA.pptx
BLAST AND FASTA.pptx
BLAST AND FASTA.pptx

More Related Content

What's hot

EMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology LaboratoryEMBL- European Molecular Biology Laboratory
Entrez databases
Entrez databasesEntrez databases
Entrez databases
Hafiz Muhammad Zeeshan Raza
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
Ramya S
 
Cath
CathCath
Cath
Ramya S
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
KAUSHAL SAHU
 
EMBL
EMBLEMBL
sequence alignment
sequence alignmentsequence alignment
sequence alignment
ammar kareem
 
Dot matrix
Dot matrixDot matrix
Dot matrix
Tania Khan
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
Hafiz Muhammad Zeeshan Raza
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASE
PrashantSharma807
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
karamveer prajapat
 
DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)
ZoufishanY
 
Structural databases
Structural databases Structural databases
Structural databases
Priyadharshana
 
UniProt
UniProtUniProt
UniProt
AmnaA7
 
SWISS-PROT
SWISS-PROTSWISS-PROT
UPGMA
UPGMAUPGMA
Blast Algorithm
Blast AlgorithmBlast Algorithm
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
naveed ul mushtaq
 

What's hot (20)

EMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology LaboratoryEMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology Laboratory
 
Entrez databases
Entrez databasesEntrez databases
Entrez databases
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Cath
CathCath
Cath
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
EMBL
EMBLEMBL
EMBL
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Dot matrix
Dot matrixDot matrix
Dot matrix
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASE
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
 
DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)
 
Structural databases
Structural databases Structural databases
Structural databases
 
UniProt
UniProtUniProt
UniProt
 
SWISS-PROT
SWISS-PROTSWISS-PROT
SWISS-PROT
 
EMBL-EBI
EMBL-EBIEMBL-EBI
EMBL-EBI
 
UPGMA
UPGMAUPGMA
UPGMA
 
Blast Algorithm
Blast AlgorithmBlast Algorithm
Blast Algorithm
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 

Similar to BLAST AND FASTA.pptx

Blast bioinformatics
Blast bioinformaticsBlast bioinformatics
Blast bioinformaticsatmapandey
 
BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234
alizain9604
 
Blast gp assignment
Blast  gp assignmentBlast  gp assignment
Blast gp assignment
barathvaj
 
Sequence similarity tools.pptx
Sequence similarity tools.pptxSequence similarity tools.pptx
Sequence similarity tools.pptx
PagudalaSangeetha
 
Sequence database
Sequence databaseSequence database
Sequence database
Dr.M.Prasad Naidu
 
Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)
AnkitTiwari354
 
Sequence comparison techniques
Sequence comparison techniquesSequence comparison techniques
Sequence comparison techniques
ruchibioinfo
 
Lecture 5.pptx
Lecture 5.pptxLecture 5.pptx
Lecture 5.pptx
ericndunek
 
BLAST
BLASTBLAST
FastA HOMOLOGY SEARCH ALGORITHM
FastA HOMOLOGY SEARCH ALGORITHMFastA HOMOLOGY SEARCH ALGORITHM
FastA HOMOLOGY SEARCH ALGORITHM
Muunda Mudenda
 
BLAST
BLASTBLAST
BLAST
rishabhaks
 
Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02
PILLAI ASWATHY VISWANATH
 
BLAST : features, types,algorithm, working etc.
BLAST : features, types,algorithm,  working  etc.BLAST : features, types,algorithm,  working  etc.
BLAST : features, types,algorithm, working etc.
Cherry
 
Blast fasta
Blast fastaBlast fasta
Blast fastayaghava
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)
Sobia
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
Meghaj Mallick
 
BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)
Ariful Islam Sagar
 
Ayush PPt Tblast-1.pptx
Ayush PPt Tblast-1.pptxAyush PPt Tblast-1.pptx
Ayush PPt Tblast-1.pptx
AyushMeshram14
 
FASTA
FASTAFASTA

Similar to BLAST AND FASTA.pptx (20)

Blast bioinformatics
Blast bioinformaticsBlast bioinformatics
Blast bioinformatics
 
BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234
 
Blast gp assignment
Blast  gp assignmentBlast  gp assignment
Blast gp assignment
 
Sequence similarity tools.pptx
Sequence similarity tools.pptxSequence similarity tools.pptx
Sequence similarity tools.pptx
 
Sequence database
Sequence databaseSequence database
Sequence database
 
Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)
 
Sequence comparison techniques
Sequence comparison techniquesSequence comparison techniques
Sequence comparison techniques
 
Lecture 5.pptx
Lecture 5.pptxLecture 5.pptx
Lecture 5.pptx
 
BLAST
BLASTBLAST
BLAST
 
FastA HOMOLOGY SEARCH ALGORITHM
FastA HOMOLOGY SEARCH ALGORITHMFastA HOMOLOGY SEARCH ALGORITHM
FastA HOMOLOGY SEARCH ALGORITHM
 
BLAST
BLASTBLAST
BLAST
 
Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02
 
Blast
BlastBlast
Blast
 
BLAST : features, types,algorithm, working etc.
BLAST : features, types,algorithm,  working  etc.BLAST : features, types,algorithm,  working  etc.
BLAST : features, types,algorithm, working etc.
 
Blast fasta
Blast fastaBlast fasta
Blast fasta
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)
 
Ayush PPt Tblast-1.pptx
Ayush PPt Tblast-1.pptxAyush PPt Tblast-1.pptx
Ayush PPt Tblast-1.pptx
 
FASTA
FASTAFASTA
FASTA
 

Recently uploaded

Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
CarlosHernanMontoyab2
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 

Recently uploaded (20)

Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 

BLAST AND FASTA.pptx

  • 2.  FASTA stands for fast-all” or “FastA”.  It was the first database similarity search tool developed, preceding the development of BLAST.  FASTA is another sequence alignment tool which is used to search similarities between sequences of DNA and proteins.  FASTA uses a “hashing” strategy to find matches for a short stretch of identical residues with a length of k. The string of residues is known as ktuples or ktups, which are equivalent to words in BLAST, but are normally shorter than the words.  Typically, a ktup is composed of two residues for protein sequences and six residues for DNA sequences.  The query sequence is thus broken down into sequence patterns or words known as k- tuples and the target sequences are searched for these k-tuples in order to find the similarities between the two.  FASTA is a fine tool for similarity searches.  These methods are not guaranteed to find the optimal alignment or true homologs, but are 50–100 times faster than dynamic programming.
  • 3.  FastA - Compares a DNA query sequence to a DNA database, or a protein query to a protein database, detecting the sequence type automatically. Versions 2 and 3 are in common use, version 3 having a highly improved score normalization method. It significantly reduces the overlap between the score distributions.  FASTX - Compares a DNA query to a protein database. It may introduce gaps only between codons.  FASTY - Compares a DNA query to a protein database, optimizing gap location, even within codons.  TFASTA - Compares a protein query to a DNA database.
  • 4.
  • 5.
  • 6. • It is used for the identification of the species. • Used for the establishment of the phylogeny • For DNA mapping • FASTA is also used for understanding the biochemical functions of the protein. • Study the evolution of the species, from where that specific species evolved, or identify the ancestors. • Calculation of the molecular weight • Identification of mutations in the sequences by comparing those sequences with the reference sequences.
  • 7.  Basic steps Step1: Set a word size, usually 6 for DNA and 2 for protein. Hashing: FASTA locates regions of the query sequence and matching regions in the database sequences that have high densities of exact word matches (without gaps). The length of the matched word is called the k-tuple parameter.  Step 2: Scoring: The ten highest scoring regions are rescored using the BLOSUM50 scoring matrix. The score for such a pair of regions is saved as the init1 score.  Step 3: Introduction of Gaps: FASTA determines if any of the initial regions from different diagonals may be joined together to form an approximate alignment with gaps. Only non-overlapping regions may be joined. The score for the joined regions is the sum of the scores of the initial regions minus a joining penalty for each gap. The score of the highest scoring region, at the end of this step, is saved as the init n. FASTA  (4) Step 4: Alignment: After computing the initial scores, FASTA determines the best segment of similarity between the query sequence and the search set sequence, using a variation of the SmithWaterman algorithm. The score for this alignment is the opt score.  Step 5: Random Sequence Simulation: In order to evaluate the significance of such alignment FASTA empirically estimates the score distribution from the alignment of many random pairs of sequences. More precisely, the characters of the query sequences are reshuffled (to maintain bias due to length and character composition) and searched against a random subset of the database. This empirical distribution is extrapolated, assuming it is an extreme value distribution, and each alignment to the real query is assigned a Z-score and an E-score. Modifications: In step4, use a band around init1
  • 8.  FASTA calculates significance “on the fly”. This can be problematic if the dataset is small. To identify an unknown protein sequence use either of these: FastA3, Ssearch3 or tFastX3. FASTA3 has improved methods of aligning sequences and of calculating the statistical significance of alignment.
  • 9.  There is no standard filename extension for a text file containing FASTA formatted sequences. The table below shows each extension and its respective meaning.
  • 10.  Developed by Steven Altschul and Samuel Karlin in 1990. • Compares nucleotide/aminoacid sequences • Is a heuristic method. • Is a fast but approximate method of alignment. • Locates local alignments/short matches called words
  • 11.
  • 12. blastp: compares a protein sequence against a protein sequence database. blastn: compares a nucleotide sequence against a nucleotide sequence database. blastx: compares a six frame translation of a nucleotide sequence against a protein database tblastn: compares a protein sequence against a six frame translation of a nucleotide database tblastx: compares a six frame translation of a nucleotide sequence against a six frame translation of a nucleotide database
  • 13.  Blast searches begin with a query sequence that will be matched against sequence databases specified by the user. •Begins by breaking down the query sequence into a series of short overlapping “words” •Default word size for BLAST N is 28 nucleotides •Default word size for BLAST P is 3 amino acids •Results obtained depend on the scoring matrix used. •BLOSUM 62 matrix is the default scoring matrix for BLASTP
  • 14.  Basic steps Step1: Set a word size, usually 11 for DNA and 3 for protein. Given query sequence, compile the list of possible words, which form with words in high scoring word pairs (Filter out low complexity regions)  Step 2: Scan database for exact matching with the list of words complied in step 1. e.g. qlnfsagw -> (ql, ln, nf, fs, sa, ag, gw) Extend the list (using some threshold T) Step 3: Scan through the string and whenever a word in the list is found try to extend it in both directions (no gaps) to get to a score beyond a threshold S. While extending use a parameter L that defines how long an extension will be tried to raise the score over S.  Modification of step 3: -Original BLAST: Extension is continued as long as the score continued to increase. - Another version -BLAST2 (gapped BLAST): - Lower value of T is used. - After extension try to combine (allowing gaps) - Find maximal scoring segment. This program uses the BLASTP or BLASTN algorithms for aligning two sequences.
  • 15.  BLAST calculates probabilities and this can fail if some assumptions are invalid for that search. There are versions of BLAST for searching nucleic acid and protein databases, which can be used to translate DNA sequences prior to comparing them to protein sequence databases in 1997. Recent improvement in BLAST is GAPPED-BLAST (three times faster than the original BLAST) and PSI-BLAST (position-specific- iterated BLAST). The GAPPED-BLAST algorithm allows gaps to be introduced into the alignments. That means that similar regions are not broken into several segments (as in the older versions). This method reflects biological relationships much better than ordinary BLAST.