3. TYPES OF BLAST
NATURE PROGRAM QUERY DATABASE
Nucleotide
BLAST
Blastn Nucleotide
(DNA/RNA)
Nucleotide
(DNA/RNA)
Protein BLAST Blastp Protein Protein
Blastx Translated
Nucleotide
Protein
Mixed BLAST tBlastn Protein Translated
Nucleotide
tBlastx Translated
Nucleotide
Translated
Nucleotide
Pg NO: 3
4. NBLAST
1
Standard Nucleotide BLAST
• BLASTN - The query is a nucleotide sequence -
The database is a nucleotide database - No
conversion is done on the query or database.
Pg NO: 4
5. INTRODUCTION
• The BLAST algorithm compares biological sequences
to one another in order to determine shared motifs
and common ancestry. However, the comparison of all
non-redundant (NR) sequences against all other NR
sequences is a computationally intensive task.
• We developed NBLAST as a cluster computer
implementation of the BLAST family of sequence
comparison programs for the purpose of generating
pre-computed BLAST alignments and neighbour lists
of NR sequences.
Pg NO: 5
6. NBLAST performs the heuristic BLAST algorithm and generates
an exhaustive database of alignments, but it only computes
alignments (i.e. the upper triangle) of a possible N2 alignments,
where N is the set of all sequences to be compared. A task-
partitioning algorithm allows for cluster computing across all
cluster nodes and the NBLAST master process produces a
BLAST sequence alignment database and a list of sequence
neighbours for each sequence record. The resulting
sequence alignment and neighbour databases are used to
serve the SeqHound query system through a C/C++ and
PERL Application Programming Interface (API).
Pg NO: 6
16. —BLASTN Applications
BLASTN application :The blastn application searches a nucleotide
query against nucleotide subject sequences or a nucleotide
database. An option of type “flag” takes no arguments, but if present
the argument is true. Four different tasks are supported:
1.) “megablast”, for very similar sequences (e.g, sequencing errors),
2.) “dc-megablast”, typically used for inter-species comparisons,
3.) “blastn”, the traditional program used for inter-species
comparisons,
4.) “blastn-short”, optimized for sequences less than 30 nucleotides.
Pg NO: 16
17. 2
Translated BLAST:
TBLASTN is a mode of operation for
BLAST that aligns protein
sequences to a nucleotide database
translated in all six frames.
tBLASTn
Pg NO: 17
18. tBLASTn
• We present the first description of the modern implementation
of TBLASTN, focusing on new techniques that were used to
implement composition-based statistics for translated
nucleotide searches.
• Composition-based statistics use the composition of the
sequences being aligned to generate more accurate E-values,
which allows for a more accurate distinction between true and
false matches.
• Until recently, composition-based statistics were available only
for protein-protein searches. They are now available as a
command line option for recent versions of TBLASTN and as an
option for TBLASTN on the NCBI BLAST web server.
Pg NO: 18
19. VERSIONS of TBLASTN
B-TBLASTN
This provides baseline behavior;
it ignores the composition of the
sequences and merely scales the
BLOSUM62 matrix to have five
more bits of accuracy before
rounding
S-TBLASTN
performs compositional
scaling
C-TBLASTN
erforms compositional
matrix adjustment
conditionally
Pg NO: 19
20. Comparison
• Statistical accuracy of
three variants of TBLASTN.
One thousand queries were
randomly selected from
mouse proteins, permuted,
and aligned to human
nuclear DNA.
• P = 1 - e -E
• A P-value represents the
probability that an
alignment of equal or
greater quality will be found
when the query and
database sequences are
unrelated.
Pg NO: 20