The document provides an introduction to BLAST (Basic Local Alignment Search Tool), which is an algorithm used to compare gene and protein sequences to those in public databases. It discusses the types of BLAST programs, the BLAST algorithm, input/output, how to perform a BLAST search, and the functions and objectives of BLAST. Specifically, BLAST is faster than previous sequence comparison methods, it outputs alignments and statistical values to evaluate matches, and its main objectives are to identify related sequences and locate domains through local alignments.
2. Introduction
Group Members:
Shahida khatun
MD. Firoz Ahmed
MD. Shariful Islam
Chandrima Das
Shantonu Kumar Roy
Merina Junaki
Ikhtina Afroz
Shanjida Afrin
MST. Shahinur Akter
MD. Ariful Islam Sagar
5. Definition
The Basic Local Alignment Search Tool
(BLAST) for comparing gene and protein
sequences against others in public
databases.
BLAST is a set of sequence comparison
algorithms used to search databases for
optimal local alignments to a query.
6. Definition
It breaks the query and databases
sequences into fragments and seeks
matches between them.
Nucleic acid/Protein Alignments were
time consuming. Alignments were done
by full alignments by using dynamic
programming. BLAST is 50 times faster
then dynamic programming.
7. Background
Beginning in the 1970s, scientists began
to accumulate DNA and protein
sequence data at an exponential rate; in
fact, researchers currently have
approximately 97 billion bases
sequenced and over 93 million records.
Amazingly, this sequence data doubles
every 18 months!
8. Background
Today, one of the most commonly used
tools to examine DNA and protein
sequences is the Basic Local Alignment
Search Tool, also known as BLAST.
BLAST is a computer algorithm that is
available for use online at the National
Center for Biotechnology Information
(NCBI) website and many other sites.
9. Types of BLAST
Nucleotide-nucleotide BLAST (blastn)
- This program, given a DNA query,
returns the most similar DNA sequences from
the DNA database that the user specifies.
Protein-protein BLAST (blastp)
- This program, given a protein query,
returns the most similar protein sequences from
the protein database that the user specifies.
Position-Specific Iterative BLAST (PSI-
BLAST) (blastpgp)
- This program is used to find distant
relatives of a protein.
10. Types of BLAST
Nucleotide 6-frame translation-protein
(blastx)
-This program compares the six-frame
conceptual translation products of a nucleotide
query sequence (both strands) against a protein
sequence database.
Nucleotide 6-frame translation-nucleotide
6-frame translation (tblastx)
-The purpose of tblastx is to find very
distant relationships between nucleotide
sequences.
11. Types of BLAST
Protein-nucleotide 6-frame translation
(tblastn)
-This program compares a protein query
against the all six reading frames of a
nucleotide sequence database.
Large numbers of query sequences
(megablast)
-When comparing large numbers of input
sequences via the command-line BLAST,
"megablast" is much faster than running BLAST
multiple times.
12. Types of BLAST
Of these programs, BLASTn and BLASTp are
the most commonly used because they use
direct comparisons, and do not require
translations.
However, since protein sequences are better
conserved evolutionarily than nucleotide
sequences, tBLASTn, tBLASTx, and BLASTx,
produce more reliable and accurate results
when dealing with coding DNA.
13. BLAST Algorithm
The blast algorithm is fast, accurate and
web-accessible.
It is relatively faster than other sequence
similarity search tools.
Complex BLAST algorithm requires
multiple steps and many parameters.
14. BLAST Algorithm
An overview of the
BLAST algorithm (a
protein to protein
search) is as follows:
Remove low-
complexity region or
sequence repeats in
the query sequence.
Make a k-letter word
list of the query
sequence - Take k=3 for
example, we list the words of
length 3 in the query protein
sequence (k is usually 11 for a
DNA sequence) "sequentially",
until the last letter of the query
sequence is included.
15. BLAST Algorithm
List the possible matching words.
Organize the remaining high-scoring words into an
efficient search tree.
Repeat step 3 to 4 for each k-letter word in the
query sequence.
Scan the database sequences for exact matches
with the remaining high-scoring words.
Extend the exact matches to high-scoring segment
pair (HSP).
16. BLAST Algorithm
List all of the HSPs in the database whose score
is high enough to be considered.
Evaluate the significance of the HSP score.
Make two or more HSP regions into a longer
alignment.
Show the gapped Smith-Waterman local
alignments of the query and each of the matched
database sequences.
Report every match whose expect score is lower
than a threshold parameter E.
17. BLAST Input-Output
Input
Input sequences
in FASTA or Genbank format.
Output
BLAST output can be delivered in a variety of
formats. These formats include HTML, plain
text, and XML formatting. For NCBI's web-
page, the default format for output is HTML.
An introduction that tells where the search
occurred and what database and query were
compared
18. BLAST Output
A list of the
sequences in the
database containing
segment pairs whose
scores were least
likely to occur by
chance
Alignments of the
high-scoring segment
pairs showing identical
and similar residues
A complete list of the
parameter settings
used for the search.
19. BLAST Output
E-value (expectation value)
The Expect value (E) is a parameter that
describes the number of hits one can
"expect" to see by chance when searching a
database of a particular size.
It decreases exponentially as the Score (S) of
the match increases.
Essentially, the E value describes the random
background noise.
In general terms the smaller E is the more
likely the match is significant.
20. BLAST Output
Default E value for blastn, blastp, blastx
and tblastn is 10
At this setting, 10 hits with scores equal to
or better than the defined alignment score,
S, are expected to occur by chance. The E-
value can be increased or decreased to
alter the stringency of the search.
Increase the E value when searching with
a short query, since it is likely to be found
many times by chance in a given database.
21. BLAST Output
Bit Score
A bit score is another prominent statistical
indicator used in addition to the E value in
a BLAST output.
The bit score measures sequence
similarity independent of query sequence
length and database size and is
normalized based on the raw pairwise
alignment score.
22. BLAST Search
• Go to http://www.ncbi.nlm.nih.gov/
• Select BLAST program
26. BLAST Function
BLAST can be used for several purposes.
These include identifying species,
locating domains, establishing phylogeny,
DNA mapping, and comparison.
Identifying species
-With the use of BLAST, we can
possibly correctly identify a species or find
homologous species. This can be useful, for
example, when we are working with a DNA
sequence from an unknown species.
27. BLAST Function
Locating domains
- When working with a protein
sequence you can input it into BLAST, to
locate known domains within the sequence of
interest.
Establishing phylogeny
-Using the results received through
BLAST we can create a phylogenetic tree
using the BLAST web-page.
28. BLAST Function
DNA mapping
-When working with a known species,
and looking to sequence a gene at an
unknown location, BLAST can compare the
chromosomal position of the sequence of
interest, to relevant sequences in the
database
Comparison
-When working with genes, BLAST
can locate common genes in two related
species, and can be used to map
annotations from one organism to another.
29. Objectives of BLAST
It is one of the most popular programs for
sequence analysis.
Enables a researcher to compare a
query sequence with a library or database
of sequence.
Identify library sequences that resemble
the query sequence above a certain
threshold.
The objective is to find high scoring
ungapped segments among related
sequences.
30. Objectives of BLAST
Alignments of the high-scoring segment
pairs showing identical and similar
residues.
A complete list of the parameter settings
used for the search.
That’s all from our presentation