Blast

PRESENTED BY :
ARUNDHATI MEHTA
BLAST
( A Bioinformatics tool )
© Arundhati Mehta 2016

BIOINFORMATICS
It is the science of managing and
analyzing biological data (informations
associated with biomolecules like DNA,
RNA, Protein etc.) using advanced
computing techniques.
Software tools for bioinformatics range
from simple command-line tools, to more
complex graphical programs and stand
alone web-services available from various
bioinformatics companies or public
institutions.

Introduction to BLAST
It is a sequence similarity search program for
comparing biological sequences such as amino acid
sequence of different proteins or the nucleotides of
DNA sequences with sequence database or library
sequences.
It is an Insilico Hybridisation experiment used
to identify significant similarities between query
sequences with the library sequences.
BLAST stands for :
B - Basic
L - Local
A - Alignment
S - Search
T - Tool

BLAST was designed by Eugene Myers , Samuel Karlin , Stephen Altschul,
Warren Gish, David J. Lipman and Webb Miller ( 1990,1994,1997 ) at the
National Institute of Health and was published in Journal of Molecular Biology
in 1990.
It was originally developed & controlled by NCBI .
Link: http://www.ncbi.nlm.nih.gov/BLAST/

BLAST - Input & Output
Input
FASTA
format
GenBank
format
Output
HTML
format
XML
format
Plain Text
Format
Default database is the
non-redundant (nr)
database maintained by
NCBI.
All BLAST programs use
a substitution scoring matrix
(BLOSUM or PAM),
determines pair-wise raw
alignment scores.

BLAST PROCESS
BLAST works through use of Heuristic Algorithm , an
algorithm that is able to produce an acceptable solution to a
problem in many practical scenarios and is more faster than
classical methods. Heuristics are typically used when there is
no known method to find an optimal solution ,under
the given constraints.
Using this BLAST finds homologous sequences , not by
comparing either sequences in its entirety, but rather by
locating short matches between the two sequences.
While attempting to find the homology sequences , sets of
common letters are known as WORDS.
SEEDING
Find similar words
between query and
each database
sequence
EXTENSION
Extend such words
to obtain high-
scoring sequence
pairs (HSPs)
EVALUATION
Calculate statistics
analytically

BLAST Types
BLAST
Amino acid
sequence
Blastp
tBlastn
DNA
sequence
Blastn
Blastx
tBlastx
• Blastp : compares protein query
against proteins sequence database.
• tBlastn : compares protein query
against the all six reading frames of a
translated nucleotide sequence database.
• Blastn : compares nucleotide query
against nucleotide sequence database.
• Blastx : compares six-frame conceptual
translation products of a nucleotide
query sequence (both strands) against a
protein sequence database.
• tBlastx : compares nucleotide query
against translated nucleotide sequence
database.

Graphic summary
• Query sequence is at the top, with
colour key for alignment scores.
• Each bar represents the portion of
another sequence that’s similar to
your query sequence :-
Red bars- most similar sequence
Pink bars- match less good
Green bars- not impressive match
Blue bars- worst score
Black bars- Bad hits

1 - This portion of each description links to the sequence record for a particular hit.
2 - Score or bit score is a value calculated from the number of gaps and substitutions
associated with each aligned sequence. The higher the score, the more significant the
alignment. Each score links to the corresponding pairwise alignment between query sequence
and hit sequence (also referred to as subject sequence).
3 - E Value (Expect Value) describes the likelihood that a sequence with a similar score will
occur in the database by chance. The smaller the E Value, the more significant the alignment
4 - These links provide the user with direct access from BLAST results to related entries in
other databases. ‘L’ links to Locus Link records and ‘S’ links to structure records in NCBI's
Molecular Modelling DataBase.

The Percentage of identity: This gives you a concrete substitute for the E-value. An
identity of more than 25 percent is good news. ( The identity is the number of identical
residues divided by the number of matched residues — gaps are simply ignored.)
The Positives field gives you a measure of the fraction of residues that are either identical or
similar — represented with a + on the actual alignment.
The Gaps field shows residues that were not aligned.
Length : is alignment length of sequence aligned by BLAST.
Top sequence : Query sequence
Bottom sequence : Hits ( referred as Subject sequence )
line between sequences : + sign (similar amino acids)
space (mismatch)
letter (identical residues)
XXXX Region : low- complexity segments
Numbers : to the right side indicate the coordinates of the match on query & on Hit
sequence.

BLAST Statistics
R = aI + bX - cO - dG
Percentage of Identities
% I = No. of identical residues
-------------------------------- x 100
No. of matched residues
Raw scores

Applications of BLAST
BLAST can be used for several purposes.
These include:
 Identifying Species:
With the use of BLAST, you can possibly
correctly identify a species and/or find
homologous species. This can be useful, for
example, when one is working with a DNA
sequence from an unknown species.
Establishing Phylogeny:
Using the results received through BLAST,
one can create a phylogenetic tree using
the BLAST web-page.

Applications of BLAST
 DNA Mapping:
When working with a known species, and looking
to sequence a gene at an unknown location, BLAST
can compare the chromosomal position of the
sequence of interest, to relevant sequences in the
database(s).
 Locating Domains:
When working with a protein sequence you can
input it into BLAST, to locate known domains
within the sequence of interest.
 Comparison:
When working with genes, BLAST can locate
common genes in two related species, and can be
used to map annotations from one organism to
another.

Blast

More Related Content

What's hot

Viewers also liked

Similar to Blast

More from ARUNDHATI MEHTA

Recently uploaded

Blast