BLAST
Basic Definitions 
 BLAST- Basic Local Alignment Search Tool. 
 Definition: The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity 
between sequences. The program compares nucleotide or protein sequences to sequence databases 
and calculates the statistical significance of matches. BLAST can be used to infer functional and 
evolutionary relationships between sequences as well as help identify members of gene families. 
 Query Sequence: DNA or Protein sequence used for submitting in a database or a computerized 
tool for analysis ( alignment, comparison etc.) 
 Sequence Alignment : It is a way of arranging the sequences of DNA, RNA, or protein to identify 
regions of similarity.
Why we need to use BLAST ? 
 Beginning in the 1970s, scientists started accumulating DNA and protein sequence data at an exponential 
rate 
 But how do investigators make sense of this massive amount of data? How can they identify the functions 
of newly cloned genes? And is it possible to estimate the evolutionary relationships between genes or 
proteins just by examining their nucleotide or amino acid sequences? To address these important issues, 
researchers must first tease out the relationships between different species that are descended from a 
common ancestor. Any sequence similarity can then be used to infer function and evolutionary 
relationships. In fact, one common method for examining and comparing genes is to search for similarities 
between newly sequenced DNA and databases of gene sequences that have already been described. By 
identifying related genes or gene families with known functions, scientists can infer the functions and 
evolutionary relationships of newly cloned genes or even whole genomes.
Why blast is developed ? 
 Smith Waterman algorithm is the first used algorithm for retrieving or finding the similar 
sequences. It aligns the two sequences for the maximum identity ( optimal alignment) but it is a 
time consuming process and needs more super computers for processing. 
 This created the need of developing BLAST . 
 BLAST- It follows Heuristic Algorithm. It is time effective in retrieving the similar sequence 
from a large database. 
 It is developed by Altschul SF, Gish W, Miller W,Myers EW, Lipman DJ, NCBI. 
 2.2.29+ is the latest version of release on 6 January 2014
BLAST 
 Blast is heuristic (experience-based techniques for problem solving, learning, and discovery that 
find a solution) which scans the database and helps in retrieving similar sequences in faster way. 
 DNA or Nucleotide , Protein and Translated nucleotide Query is used. 
 FASTA Format or Gen bank format is used. 
 Types of BLAST 
 Nucleotide blast, protein blast, blast X, tblastn, tblastx.
Variants of Blast 
 N Blast- search a nucleotide database using a nucleotide query. 
 P Blast- search a protein database using a protein query. 
 Blast x- search a protein database using a translated nucleotide query. 
 T blast n- search translated nucleotide database using a protein query. 
 T blast x- search translated nucleotide database using a protein query
Blast algorithm 
 Query input 
 Compiling a list of words/ seeds from query ( seeding) 
 Scanning the database with the list of words 
 Let’s assume that words match the neighbourhood words of database 
 Calculates by summing the match scores based on BLOSUM 62 matrix 
 Finds the database corresponding to the best word match (above threshold) and extend alignment 
in both directions until the alignment scores falls below the threshold due to mismatches ( 22-p 
and 20-n)
Step 1 : query sequence MRDPYNKLS 
Step 2 : compiling seeds 
PYN 
( L-W+1 , P-3 AND N-11) 
HSP 
MRD 
RDP 
Step 5: Extension of HSP in both direction and Scores are give as per blosum 62 matrix 
DPY 
PYN 
Extension continues until the score falls below threshold 
YNK 
NKL 
KLS 
3 scan the db 
and gives matching 
neighborhood words 
Step 4 : scores are given based on blosum 62 matrix 
For example, let take this seed PYN PYN PYN 
Db words PYN PFN PFQ PFE 
776 
20 16 10 10 
In gapped blast, gaps are introduced in the alignment
How to run blast ?
PAM vs BLOSUM 
 PAM- Point Accepted Mutation scoring 
matrix ( Margaret Dayoff) 
 PAM matrix is used for closely related 
species to identify the mutational difference 
between the two sequences 
 Based on Global alignment 
 BLOSUM- Block Substitution Matrix ( 
Steven and Henikoff) 
 Blosum matrix is used for the evolutionarily 
distant sequences for identifying its 
evolutionary relation ship 
 Based on Local alignment
Thank’s For Listening

Blast gp assignment

  • 1.
  • 2.
    Basic Definitions BLAST- Basic Local Alignment Search Tool.  Definition: The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.  Query Sequence: DNA or Protein sequence used for submitting in a database or a computerized tool for analysis ( alignment, comparison etc.)  Sequence Alignment : It is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity.
  • 3.
    Why we needto use BLAST ?  Beginning in the 1970s, scientists started accumulating DNA and protein sequence data at an exponential rate  But how do investigators make sense of this massive amount of data? How can they identify the functions of newly cloned genes? And is it possible to estimate the evolutionary relationships between genes or proteins just by examining their nucleotide or amino acid sequences? To address these important issues, researchers must first tease out the relationships between different species that are descended from a common ancestor. Any sequence similarity can then be used to infer function and evolutionary relationships. In fact, one common method for examining and comparing genes is to search for similarities between newly sequenced DNA and databases of gene sequences that have already been described. By identifying related genes or gene families with known functions, scientists can infer the functions and evolutionary relationships of newly cloned genes or even whole genomes.
  • 4.
    Why blast isdeveloped ?  Smith Waterman algorithm is the first used algorithm for retrieving or finding the similar sequences. It aligns the two sequences for the maximum identity ( optimal alignment) but it is a time consuming process and needs more super computers for processing.  This created the need of developing BLAST .  BLAST- It follows Heuristic Algorithm. It is time effective in retrieving the similar sequence from a large database.  It is developed by Altschul SF, Gish W, Miller W,Myers EW, Lipman DJ, NCBI.  2.2.29+ is the latest version of release on 6 January 2014
  • 5.
    BLAST  Blastis heuristic (experience-based techniques for problem solving, learning, and discovery that find a solution) which scans the database and helps in retrieving similar sequences in faster way.  DNA or Nucleotide , Protein and Translated nucleotide Query is used.  FASTA Format or Gen bank format is used.  Types of BLAST  Nucleotide blast, protein blast, blast X, tblastn, tblastx.
  • 6.
    Variants of Blast  N Blast- search a nucleotide database using a nucleotide query.  P Blast- search a protein database using a protein query.  Blast x- search a protein database using a translated nucleotide query.  T blast n- search translated nucleotide database using a protein query.  T blast x- search translated nucleotide database using a protein query
  • 7.
    Blast algorithm Query input  Compiling a list of words/ seeds from query ( seeding)  Scanning the database with the list of words  Let’s assume that words match the neighbourhood words of database  Calculates by summing the match scores based on BLOSUM 62 matrix  Finds the database corresponding to the best word match (above threshold) and extend alignment in both directions until the alignment scores falls below the threshold due to mismatches ( 22-p and 20-n)
  • 8.
    Step 1 :query sequence MRDPYNKLS Step 2 : compiling seeds PYN ( L-W+1 , P-3 AND N-11) HSP MRD RDP Step 5: Extension of HSP in both direction and Scores are give as per blosum 62 matrix DPY PYN Extension continues until the score falls below threshold YNK NKL KLS 3 scan the db and gives matching neighborhood words Step 4 : scores are given based on blosum 62 matrix For example, let take this seed PYN PYN PYN Db words PYN PFN PFQ PFE 776 20 16 10 10 In gapped blast, gaps are introduced in the alignment
  • 9.
    How to runblast ?
  • 10.
    PAM vs BLOSUM  PAM- Point Accepted Mutation scoring matrix ( Margaret Dayoff)  PAM matrix is used for closely related species to identify the mutational difference between the two sequences  Based on Global alignment  BLOSUM- Block Substitution Matrix ( Steven and Henikoff)  Blosum matrix is used for the evolutionarily distant sequences for identifying its evolutionary relation ship  Based on Local alignment
  • 11.