Definition
The Basic LocalAlignment Search Tool
(BLAST) for comparing gene and protein
sequences against others in public
databases.
BLAST is a set of sequence comparison
algorithms used to search databases for
optimal local alignments to a query.
4.
Definition
It breaks thequery and databases
sequences into fragments and seeks
matches between them.
Nucleic acid/Protein Alignments were time
consuming. Alignments were done by full
alignments by using dynamic programming.
BLAST is 50 times faster then dynamic
programming.
5.
Background
Beginning in the1970s, scientists began to
accumulate DNA and protein sequence data
at an exponential rate; in fact, researchers
currently have approximately 97 billion
bases sequenced and over 93 million
records.
Amazingly, this sequence data doubles every
18 months!
6.
Background
Today, one ofthe most commonly used
tools to examine DNA and protein
sequences is the Basic Local Alignment
Search Tool, also known as BLAST.
BLAST is a computer algorithm that is
available for use online at the National
Center for Biotechnology Information
(NCBI) website and many other sites.
7.
Types of BLAST
Nucleotide-nucleotide BLAST (blastn)
-This program, given a DNA query, returns the
most similar DNA sequences from the DNA database that
the user specifies.
Protein-protein BLAST (blastp)
-This program, given a protein query, returns the
most similar protein sequences from the protein
database that the user specifies.
Position-Specific Iterative BLAST (PSI-BLAST)
(blastpgp)
-This program is used to find distant relatives of a
protein.
8.
Types of BLAST
Nucleotide 6-frame translation-protein
(blastx)
-This program compares the six-frame
conceptual translation products of a nucleotide
query sequence (both strands) against a protein
sequence database.
Nucleotide 6-frame translation-nucleotide
6-frame translation (tblastx)
-The purpose of tblastx is to find very distant
relationships between nucleotide sequences.
9.
Types of BLAST
Protein-nucleotide 6-frame translation
(tblastn)
-This program compares a protein query
against the all six reading frames of a nucleotide
sequence database.
Large numbers of query sequences
(megablast)
-When comparing large numbers of input
sequences via the command-line BLAST, "megablast"
is much faster than running BLAST multiple times.
10.
Types of BLAST
Ofthese programs, BLASTn and BLASTp are the
most commonly used because they use direct
comparisons, and do not require translations.
However, since protein sequences are better
conserved evolutionarily than nucleotide sequences,
tBLASTn, tBLASTx, and BLASTx, produce more
reliable and accurate results when dealing with
coding DNA.
11.
BLAST Algorithm
The blastalgorithm is fast, accurate and
web-accessible.
It is relatively faster than other sequence
similarity search tools.
Complex BLAST algorithm requires
multiple steps and many parameters.
12.
BLAST Algorithm
An overviewof the BLAST
algorithm (a protein to
protein search) is as
follows:
Remove low-complexity
region or sequence
repeats in the query
sequence.
Make a k-letter word list
of the query sequence -
Take k=3 for example, we list the words of
length 3 in the query protein sequence
(k is usually 11 for a DNA sequence)
"sequentially", until the last letter of the
query sequence is included.
13.
BLAST Algorithm
Listthe possible matching words.
Organize the remaining high-scoring words into an
efficient search tree.
Repeat step 3 to 4 for each k-letter word in the query
sequence.
Scan the database sequences for exact matches with the
remaining high-scoring words.
Extend the exact matches to high-scoring segment pair
(HSP).
14.
BLAST Algorithm
Listall of the HSPs in the database whose score is high
enough to be considered.
Evaluate the significance of the HSP score.
Make two or more HSP regions into a longer
alignment.
Show the gapped Smith-Waterman local alignments of
the query and each of the matched database
sequences.
Report every match whose expect score is lower than
a threshold parameter E.
15.
BLAST Input-Output
Input
Input sequencesin FASTA or Genbank format.
Output
BLAST output can be delivered in a variety of
formats.These formats include HTML, plain text,
and XML formatting. For NCBI's web-page, the
default format for output is HTML.
An introduction that tells where the search occurred
and what database and query were compared
16.
BLAST Output
Alist of the sequences in the
database containing segment
pairs whose scores were least
likely to occur by chance
Alignments of the high-
scoring segment pairs
showing identical and similar
residues
A complete list of the
parameter settings used for
the search.
17.
BLAST Output
E-value (expectationvalue)
The Expect value (E) is a parameter that
describes the number of hits one can "expect"
to see by chance when searching a database of
a particular size.
It decreases exponentially as the Score (S) of
the match increases.
Essentially, the E value describes the random
background noise.
In general terms the smaller E is the more
likely the match is significant.
18.
BLAST Output
DefaultE value for blastn, blastp, blastx and
tblastn is 10
At this setting, 10 hits with scores equal to or
better than the defined alignment score, S, are
expected to occur by chance.The E-value can
be increased or decreased to alter the
stringency of the search.
Increase the E value when searching with a
short query, since it is likely to be found many
times by chance in a given database.
19.
BLAST Output
Bit Score
A bit score is another prominent statistical
indicator used in addition to the E value in a
BLAST output.
The bit score measures sequence similarity
independent of query sequence length and
database size and is normalized based on the
raw pairwise alignment score.
20.
BLAST Search
• Goto http://www.ncbi.nlm.nih.gov/
• Select BLAST program
BLAST Function
BLAST canbe used for several purposes.
These include identifying species, locating
domains, establishing phylogeny, DNA
mapping, and comparison.
Identifying species
-With the use of BLAST, we can possibly
correctly identify a species or find homologous
species.This can be useful, for example, when we
are working with a DNA sequence from an
unknown species.
25.
BLAST Function
Locating domains
-When working with a protein sequence
you can input it into BLAST, to locate known
domains within the sequence of interest.
Establishing phylogeny
-Using the results received through BLAST
we can create a phylogenetic tree using the BLAST
web-page.
26.
BLAST Function
DNA mapping
-Whenworking with a known species, and
looking to sequence a gene at an unknown location,
BLAST can compare the chromosomal position of the
sequence of interest, to relevant sequences in the
database
Comparison
-When working with genes, BLAST can locate
common genes in two related species, and can be
used to map annotations from one organism to
another.
27.
Objectives of BLAST
It is one of the most popular programs for
sequence analysis.
Enables a researcher to compare a query
sequence with a library or database of
sequence.
Identify library sequences that resemble the
query sequence above a certain threshold.
The objective is to find high scoring ungapped
segments among related sequences.
28.
Objectives of BLAST
Alignments of the high-scoring segment pairs
showing identical and similar residues.
A complete list of the parameter settings used
for the search.