SlideShare a Scribd company logo
1 of 18
FASTA
(FAST-All)
 FASTA stands for fast-all” or “FastA”.
 It was the first database similarity search tool developed, preceding the development of
BLAST.
 FASTA is another sequence alignment tool which is used to search similarities between
sequences of DNA and proteins.
 FASTA uses a “hashing” strategy to find matches for a short stretch of identical residues
with a length of k. The string of residues is known as ktuples or ktups, which are
equivalent to words in BLAST, but are normally shorter than the words.
 Typically, a ktup is composed of two residues for protein sequences and six residues for
DNA sequences.
 The query sequence is thus broken down into sequence patterns or words known as k-
tuples and the target sequences are searched for these k-tuples in order to find the
similarities between the two.
 FASTA is a fine tool for similarity searches.
 These methods are not guaranteed to find the optimal alignment or true homologs, but are
50–100 times faster than dynamic programming.
 FastA - Compares a DNA query sequence to a DNA
database, or a protein query to a protein database,
detecting the sequence type automatically.
Versions 2 and 3 are in common use, version 3
having a highly improved score normalization
method. It significantly reduces the overlap between
the score distributions.
 FASTX - Compares a DNA query to a protein
database. It may introduce gaps only between
codons.
 FASTY - Compares a DNA query to a protein
database, optimizing gap location, even within
codons.
 TFASTA - Compares a protein query to a DNA
database.
• It is used for the identification of the species.
• Used for the establishment of the phylogeny
• For DNA mapping
• FASTA is also used for understanding the
biochemical functions of the protein.
• Study the evolution of the species, from where
that specific species evolved, or identify the
ancestors.
• Calculation of the molecular weight
• Identification of mutations in the sequences by
comparing those sequences with the reference
sequences.
 Basic steps Step1: Set a word size, usually 6 for DNA and 2 for protein. Hashing: FASTA
locates regions of the query sequence and matching regions in the database sequences
that have high densities of exact word matches (without gaps). The length of the
matched word is called the k-tuple parameter.
 Step 2: Scoring: The ten highest scoring regions are rescored using the BLOSUM50
scoring matrix. The score for such a pair of regions is saved as the init1 score.
 Step 3: Introduction of Gaps: FASTA determines if any of the initial regions from
different diagonals may be joined together to form an approximate alignment with gaps.
Only non-overlapping regions may be joined. The score for the joined regions is the
sum of the scores of the initial regions minus a joining penalty for each gap. The score
of the highest scoring region, at the end of this step, is saved as the init n. FASTA
 (4) Step 4: Alignment: After computing the initial scores, FASTA determines the best
segment of similarity between the query sequence and the search set sequence, using a
variation of the SmithWaterman algorithm. The score for this alignment is the opt score.
 Step 5: Random Sequence Simulation: In order to evaluate the significance of such
alignment FASTA empirically estimates the score distribution from the alignment of
many random pairs of sequences. More precisely, the characters of the query sequences
are reshuffled (to maintain bias due to length and character composition) and searched
against a random subset of the database. This empirical distribution is extrapolated,
assuming it is an extreme value distribution, and each alignment to the real query is
assigned a Z-score and an E-score. Modifications: In step4, use a band around init1
 FASTA calculates significance “on the fly”.
This can be problematic if the dataset is
small. To identify an unknown protein
sequence use either of these: FastA3,
Ssearch3 or tFastX3. FASTA3 has improved
methods of aligning sequences and of
calculating the statistical significance of
alignment.
 There is no standard filename extension for a
text file containing FASTA formatted
sequences. The table below shows each
extension and its respective meaning.
 Developed by Steven Altschul and Samuel
Karlin in 1990.
• Compares nucleotide/aminoacid
sequences
• Is a heuristic method.
• Is a fast but approximate method of
alignment.
• Locates local alignments/short matches
called words
blastp: compares a protein sequence against a
protein sequence database.
blastn: compares a nucleotide sequence against a
nucleotide sequence database.
blastx: compares a six frame translation of a
nucleotide sequence against a protein database
tblastn: compares a protein sequence against a
six frame translation of a nucleotide database
tblastx: compares a six frame translation of a
nucleotide sequence against a six frame
translation of a nucleotide database
 Blast searches begin with a query sequence
that will be matched against sequence
databases specified by the user.
•Begins by breaking down the query sequence
into a series of short overlapping “words”
•Default word size for BLAST N is 28 nucleotides
•Default word size for BLAST P is 3 amino acids
•Results obtained depend on the scoring matrix
used.
•BLOSUM 62 matrix is the default scoring matrix
for BLASTP
 Basic steps Step1: Set a word size, usually 11 for DNA and
3 for protein. Given query sequence, compile the list of
possible words, which form with words in high scoring
word pairs (Filter out low complexity regions)
 Step 2: Scan database for exact matching with the list of
words complied in step 1. e.g. qlnfsagw -> (ql, ln, nf, fs,
sa, ag, gw) Extend the list (using some threshold T) Step 3:
Scan through the string and whenever a word in the list is
found try to extend it in both directions (no gaps) to get to
a score beyond a threshold S. While extending use a
parameter L that defines how long an extension will be
tried to raise the score over S.
 Modification of step 3: -Original BLAST: Extension is
continued as long as the score continued to increase. -
Another version -BLAST2 (gapped BLAST): - Lower value of
T is used. - After extension try to combine (allowing gaps)
- Find maximal scoring segment. This program uses the
BLASTP or BLASTN algorithms for aligning two sequences.
 BLAST calculates probabilities and this can fail if
some assumptions are invalid for that search. There
are versions of BLAST for searching nucleic acid and
protein databases, which can be used to translate
DNA sequences prior to comparing them to protein
sequence databases in 1997. Recent improvement in
BLAST is GAPPED-BLAST (three times faster than the
original BLAST) and PSI-BLAST (position-specific-
iterated BLAST). The GAPPED-BLAST algorithm allows
gaps to be introduced into the alignments. That
means that similar regions are not broken into
several segments (as in the older versions). This
method reflects biological relationships much better
than ordinary BLAST.
BLAST AND FASTA.pptx
BLAST AND FASTA.pptx
BLAST AND FASTA.pptx

More Related Content

What's hot (20)

Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Clustal
ClustalClustal
Clustal
 
Fasta
FastaFasta
Fasta
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. local
 
Biological database
Biological databaseBiological database
Biological database
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)
 
Gene bank by kk sahu
Gene bank by kk sahuGene bank by kk sahu
Gene bank by kk sahu
 
dot plot analysis
dot plot analysisdot plot analysis
dot plot analysis
 
Swiss PROT
Swiss PROT Swiss PROT
Swiss PROT
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
 
Scop database
Scop databaseScop database
Scop database
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES
 
Finding ORF
Finding ORFFinding ORF
Finding ORF
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 

Similar to BLAST AND FASTA.pptx

Similar to BLAST AND FASTA.pptx (20)

Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
Blast bioinformatics
Blast bioinformaticsBlast bioinformatics
Blast bioinformatics
 
BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234
 
Blast gp assignment
Blast  gp assignmentBlast  gp assignment
Blast gp assignment
 
Sequence similarity tools.pptx
Sequence similarity tools.pptxSequence similarity tools.pptx
Sequence similarity tools.pptx
 
Sequence database
Sequence databaseSequence database
Sequence database
 
Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)
 
Sequence comparison techniques
Sequence comparison techniquesSequence comparison techniques
Sequence comparison techniques
 
Lecture 5.pptx
Lecture 5.pptxLecture 5.pptx
Lecture 5.pptx
 
BLAST
BLASTBLAST
BLAST
 
FastA HOMOLOGY SEARCH ALGORITHM
FastA HOMOLOGY SEARCH ALGORITHMFastA HOMOLOGY SEARCH ALGORITHM
FastA HOMOLOGY SEARCH ALGORITHM
 
BLAST
BLASTBLAST
BLAST
 
Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02
 
Blast
BlastBlast
Blast
 
Blast fasta
Blast fastaBlast fasta
Blast fasta
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
Ayush PPt Tblast-1.pptx
Ayush PPt Tblast-1.pptxAyush PPt Tblast-1.pptx
Ayush PPt Tblast-1.pptx
 
FASTA
FASTAFASTA
FASTA
 
Sequence alignment.pptx
Sequence alignment.pptxSequence alignment.pptx
Sequence alignment.pptx
 

Recently uploaded

Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 

Recently uploaded (20)

Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 

BLAST AND FASTA.pptx

  • 2.  FASTA stands for fast-all” or “FastA”.  It was the first database similarity search tool developed, preceding the development of BLAST.  FASTA is another sequence alignment tool which is used to search similarities between sequences of DNA and proteins.  FASTA uses a “hashing” strategy to find matches for a short stretch of identical residues with a length of k. The string of residues is known as ktuples or ktups, which are equivalent to words in BLAST, but are normally shorter than the words.  Typically, a ktup is composed of two residues for protein sequences and six residues for DNA sequences.  The query sequence is thus broken down into sequence patterns or words known as k- tuples and the target sequences are searched for these k-tuples in order to find the similarities between the two.  FASTA is a fine tool for similarity searches.  These methods are not guaranteed to find the optimal alignment or true homologs, but are 50–100 times faster than dynamic programming.
  • 3.  FastA - Compares a DNA query sequence to a DNA database, or a protein query to a protein database, detecting the sequence type automatically. Versions 2 and 3 are in common use, version 3 having a highly improved score normalization method. It significantly reduces the overlap between the score distributions.  FASTX - Compares a DNA query to a protein database. It may introduce gaps only between codons.  FASTY - Compares a DNA query to a protein database, optimizing gap location, even within codons.  TFASTA - Compares a protein query to a DNA database.
  • 4.
  • 5.
  • 6. • It is used for the identification of the species. • Used for the establishment of the phylogeny • For DNA mapping • FASTA is also used for understanding the biochemical functions of the protein. • Study the evolution of the species, from where that specific species evolved, or identify the ancestors. • Calculation of the molecular weight • Identification of mutations in the sequences by comparing those sequences with the reference sequences.
  • 7.  Basic steps Step1: Set a word size, usually 6 for DNA and 2 for protein. Hashing: FASTA locates regions of the query sequence and matching regions in the database sequences that have high densities of exact word matches (without gaps). The length of the matched word is called the k-tuple parameter.  Step 2: Scoring: The ten highest scoring regions are rescored using the BLOSUM50 scoring matrix. The score for such a pair of regions is saved as the init1 score.  Step 3: Introduction of Gaps: FASTA determines if any of the initial regions from different diagonals may be joined together to form an approximate alignment with gaps. Only non-overlapping regions may be joined. The score for the joined regions is the sum of the scores of the initial regions minus a joining penalty for each gap. The score of the highest scoring region, at the end of this step, is saved as the init n. FASTA  (4) Step 4: Alignment: After computing the initial scores, FASTA determines the best segment of similarity between the query sequence and the search set sequence, using a variation of the SmithWaterman algorithm. The score for this alignment is the opt score.  Step 5: Random Sequence Simulation: In order to evaluate the significance of such alignment FASTA empirically estimates the score distribution from the alignment of many random pairs of sequences. More precisely, the characters of the query sequences are reshuffled (to maintain bias due to length and character composition) and searched against a random subset of the database. This empirical distribution is extrapolated, assuming it is an extreme value distribution, and each alignment to the real query is assigned a Z-score and an E-score. Modifications: In step4, use a band around init1
  • 8.  FASTA calculates significance “on the fly”. This can be problematic if the dataset is small. To identify an unknown protein sequence use either of these: FastA3, Ssearch3 or tFastX3. FASTA3 has improved methods of aligning sequences and of calculating the statistical significance of alignment.
  • 9.  There is no standard filename extension for a text file containing FASTA formatted sequences. The table below shows each extension and its respective meaning.
  • 10.  Developed by Steven Altschul and Samuel Karlin in 1990. • Compares nucleotide/aminoacid sequences • Is a heuristic method. • Is a fast but approximate method of alignment. • Locates local alignments/short matches called words
  • 11.
  • 12. blastp: compares a protein sequence against a protein sequence database. blastn: compares a nucleotide sequence against a nucleotide sequence database. blastx: compares a six frame translation of a nucleotide sequence against a protein database tblastn: compares a protein sequence against a six frame translation of a nucleotide database tblastx: compares a six frame translation of a nucleotide sequence against a six frame translation of a nucleotide database
  • 13.  Blast searches begin with a query sequence that will be matched against sequence databases specified by the user. •Begins by breaking down the query sequence into a series of short overlapping “words” •Default word size for BLAST N is 28 nucleotides •Default word size for BLAST P is 3 amino acids •Results obtained depend on the scoring matrix used. •BLOSUM 62 matrix is the default scoring matrix for BLASTP
  • 14.  Basic steps Step1: Set a word size, usually 11 for DNA and 3 for protein. Given query sequence, compile the list of possible words, which form with words in high scoring word pairs (Filter out low complexity regions)  Step 2: Scan database for exact matching with the list of words complied in step 1. e.g. qlnfsagw -> (ql, ln, nf, fs, sa, ag, gw) Extend the list (using some threshold T) Step 3: Scan through the string and whenever a word in the list is found try to extend it in both directions (no gaps) to get to a score beyond a threshold S. While extending use a parameter L that defines how long an extension will be tried to raise the score over S.  Modification of step 3: -Original BLAST: Extension is continued as long as the score continued to increase. - Another version -BLAST2 (gapped BLAST): - Lower value of T is used. - After extension try to combine (allowing gaps) - Find maximal scoring segment. This program uses the BLASTP or BLASTN algorithms for aligning two sequences.
  • 15.  BLAST calculates probabilities and this can fail if some assumptions are invalid for that search. There are versions of BLAST for searching nucleic acid and protein databases, which can be used to translate DNA sequences prior to comparing them to protein sequence databases in 1997. Recent improvement in BLAST is GAPPED-BLAST (three times faster than the original BLAST) and PSI-BLAST (position-specific- iterated BLAST). The GAPPED-BLAST algorithm allows gaps to be introduced into the alignments. That means that similar regions are not broken into several segments (as in the older versions). This method reflects biological relationships much better than ordinary BLAST.