2. BLAST
• Basic Local Alignment Search Tool (BLAST) - a program
that can detect sequence similarity between a Query
sequence and sequences within a database.
• It uses a robust statistical framework that can determine if
the alignment between two sequences is statistically
significant.
Uses
– Helpful in identifying well known genes in a novel
sequence
– Helps to determine the relation between particular gene
or protein with other genes or proteins.
3. • Navigate to the BLAST main page at
http://blast.ncbi.nlm.nih.gov/Blast.cgi.
4. • All NCBI BLAST pages have the same header with
four tabs:
Tab Explanation
Home Link to the BLAST home page
Recent
Results
Link to results of the BLAST searches you have
performed in your current browser session
Saved
Strategies
BLAST input forms with the parameters you
have saved to your MyNCBI account
Help List of all BLAST help documentations
5. Two other sections:
• “Basic BLAST” section contains links to common
BLAST programs.
The different BLAST programs available on the NCBI web server
6. • “Specialized BLAST‟ section of the NCBI BLAST main
page can be used to align 2 or more sequences using ‘blast
2 sequences’
7. • Three main criterions
– BLAST program we wish to use
– query sequence we want to interpret
– database we want to search
Other optional parameters such as expect threshold and
other scoring parameters can be used to modify the
behaviour of BLAST.
BLAST search
It has four components:
query,
database,
program,
search purpose/goal.
8. 1. Prepare the query sequence (sequence to be BLAST) you want to
use as raw or in fasta format.
2. Go to NCBI home page available at http://www.ncbi.nlm.nih.gov/
and choose BLAST program.
9. 3. Choose the nucleotide BLAST program.
4. Copy and paste the sequence in the sequence box or alternatively
you can also browse and upload the sequence in fasta format from
your computer (Fig. 5).
5. Select the database you want to search against (preferably ‘Others’
(nr - non redundant).
6. If you wish to search against any specific organism, under
‘Organism’, enter first few characters of the organism’s name and
select the organism for the dropdown list.
11. 7. Choose the appropriate program under ‘Program Selection’.
MEGABLAST - to identify identical nucleotide sequence.
Discontiguous MEGABLAST – to identify similar nucleotide
sequences, but not identical.
Blastn - to identify related nucleotide sequences from other
organisms (distantly related).
MEGABLAST
13. 8. Adjust any parameter, if required. Any parameter changed is
highlighted in yellow.
9. Click ‘BLAST’ button at the end of the page.
14. Interpreting the BLAST result
The BLAST result consists of three major sections:
(1) The Header - containing information about the
query sequence, the database searched, BLAST version
and its release date. It also provides a graphical overview
of the query coverage;
(2) The Descriptions – showing the one-line
descriptions of each hit (sequence found to match in the
database) of the query sequence; provides a quick
overview for browsing;
(3) The Alignments – displaying the pairwise
alignments of the query sequence against each database
sequence matched.
16. The header
1. It is then followed by the graphical output. The query sequence is represented
by the numbered red bar at the top of the Figure.
2. The hits from the database are shown below the red bar; the most similar are
shown closest to the query. Sometimes a thin black line is observed in the bar. This
indicates two High Scoring Pairs (HSP) are from the same sequence.
Perpendicular line (arrow) - the distance between the two HSPs is less
Horizontal line (arrow) - more distance between the two HSPs.
3. The Figure also provides a colour key to represent the match of the hits. The
length of the colour bar indicates the query coverage. Moving the mouse cursor
over the bars displays the definition line above the graphic for that sequence
17. The Descriptions
4. The One-line descriptions is composed of (a) a brief textual
description , (b) the max score, (c) the total score, (d) the query
coverage, (e) the E value, (f) the maximum identity and the accession
number. Clicking the hyperlinks will open corresponding GenBank
record.
5. The one-line descriptions are useful for people to get a quick
overview of their search results.
18. Description: Information about the sequence record for a particular hit.
Hit: Matched sequence in the database.
Score: A score is numerical value that describes the overall quality of the
alignment.
Max score: Highest alignment score between query and database sequence
segment.
Total score: Sum of alignment scores of all segments from the same
database sequence that match the query sequence (calculated over all
segments). This score is different from the max score if several parts of the
database sequence match different parts of the query sequence.
Query coverage: Percent of query length that is included in the aligned
segments. This coverage is calculated over all segments.
E value: Number of alignments expected by chance with a particular score
or better. It is derived from P value. The E-value below 10-4 usually indicates
the evidence for homology.
Max identity: BLAST calculates the percentage identity between the query
and the hit in a nucleotide-to-nucleotide alignment. If there are multiple
alignments with a single hit, then only the highest percent identity is shown.
Accession: A unique identifier number of a specific (sequence) GenBank
record.
19. The alignments
6. The pairwise sequence alignment is preceded by the sequence identifier, the
full definition line, and the length of the matched sequence.
The ‘Range 1:’ indicates the first segment that is matched and the following
values tell the range that is being matched with our query sequence.
20. 7. Next comes the bit score (the raw score is in parentheses) and then the
E value, then the percentage of identities, gaps and finally the strand. A
plus/plus indicates that both the query and subject are in forward strand.
A plus/minus indicates that the query is in reverse strand. This is
because, usually all the sequence in the database is stored only as plus
strand.
8. The ‘Query’ refers to our query sequence and the ‘Sbjct’ refers to the
subject sequence (the match) from the database.
9. By default, a maximum of 100 sequence matches are displayed, which
can be changed on the BLAST page ‘Algorithm parameters’ option.