Blast gp assignment

Basic Definitions
 BLAST- Basic Local Alignment Search Tool.
 Definition: The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity
between sequences. The program compares nucleotide or protein sequences to sequence databases
and calculates the statistical significance of matches. BLAST can be used to infer functional and
evolutionary relationships between sequences as well as help identify members of gene families.
 Query Sequence: DNA or Protein sequence used for submitting in a database or a computerized
tool for analysis ( alignment, comparison etc.)
 Sequence Alignment : It is a way of arranging the sequences of DNA, RNA, or protein to identify
regions of similarity.

Why we need to use BLAST ?
 Beginning in the 1970s, scientists started accumulating DNA and protein sequence data at an exponential
rate
 But how do investigators make sense of this massive amount of data? How can they identify the functions
of newly cloned genes? And is it possible to estimate the evolutionary relationships between genes or
proteins just by examining their nucleotide or amino acid sequences? To address these important issues,
researchers must first tease out the relationships between different species that are descended from a
common ancestor. Any sequence similarity can then be used to infer function and evolutionary
relationships. In fact, one common method for examining and comparing genes is to search for similarities
between newly sequenced DNA and databases of gene sequences that have already been described. By
identifying related genes or gene families with known functions, scientists can infer the functions and
evolutionary relationships of newly cloned genes or even whole genomes.

Why blast is developed ?
 Smith Waterman algorithm is the first used algorithm for retrieving or finding the similar
sequences. It aligns the two sequences for the maximum identity ( optimal alignment) but it is a
time consuming process and needs more super computers for processing.
 This created the need of developing BLAST .
 BLAST- It follows Heuristic Algorithm. It is time effective in retrieving the similar sequence
from a large database.
 It is developed by Altschul SF, Gish W, Miller W,Myers EW, Lipman DJ, NCBI.
 2.2.29+ is the latest version of release on 6 January 2014

BLAST
 Blast is heuristic (experience-based techniques for problem solving, learning, and discovery that
find a solution) which scans the database and helps in retrieving similar sequences in faster way.
 DNA or Nucleotide , Protein and Translated nucleotide Query is used.
 FASTA Format or Gen bank format is used.
 Types of BLAST
 Nucleotide blast, protein blast, blast X, tblastn, tblastx.

Variants of Blast
 N Blast- search a nucleotide database using a nucleotide query.
 P Blast- search a protein database using a protein query.
 Blast x- search a protein database using a translated nucleotide query.
 T blast n- search translated nucleotide database using a protein query.
 T blast x- search translated nucleotide database using a protein query

Blast algorithm
 Query input
 Compiling a list of words/ seeds from query ( seeding)
 Scanning the database with the list of words
 Let’s assume that words match the neighbourhood words of database
 Calculates by summing the match scores based on BLOSUM 62 matrix
 Finds the database corresponding to the best word match (above threshold) and extend alignment
in both directions until the alignment scores falls below the threshold due to mismatches ( 22-p
and 20-n)

Step 1 : query sequence MRDPYNKLS
Step 2 : compiling seeds
PYN
( L-W+1 , P-3 AND N-11)
HSP
MRD
RDP
Step 5: Extension of HSP in both direction and Scores are give as per blosum 62 matrix
DPY
PYN
Extension continues until the score falls below threshold
YNK
NKL
KLS
3 scan the db
and gives matching
neighborhood words
Step 4 : scores are given based on blosum 62 matrix
For example, let take this seed PYN PYN PYN
Db words PYN PFN PFQ PFE
776
20 16 10 10
In gapped blast, gaps are introduced in the alignment

PAM vs BLOSUM
 PAM- Point Accepted Mutation scoring
matrix ( Margaret Dayoff)
 PAM matrix is used for closely related
species to identify the mutational difference
between the two sequences
 Based on Global alignment
 BLOSUM- Block Substitution Matrix (
Steven and Henikoff)
 Blosum matrix is used for the evolutionarily
distant sequences for identifying its
evolutionary relation ship
 Based on Local alignment

Blast gp assignment

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Blast gp assignment

Similar to Blast gp assignment (20)

Recently uploaded

Recently uploaded (20)

Blast gp assignment