Basic Local Alignment Tool (BLAST) bioinformatics

BLAST
•Basic Local Alignment Search Tool
•Algorithm for comparing a given sequence against sequences in a
database
•A match between two sequences is an alignment.
•Many BLAST databases and web services available.
•Set of programs that search sequence databases for statistically
significant similarities
•Heuristic method for local alignment
•Based on the same assumption as FASTA that good alignments
contain short lengths of exact matches

• BLAST is a widely used bioinformatics program that
was first introduced by Stephen Altschul and Samuel
Karlin in 1990 and has since become one of the most
popular tools for sequence similarity search.
• BLAST is a powerful tool for analyzing biological
sequence data. Since the initial release of BLAST in
1990, it has undergone continuous updates to improve
its speed and accuracy.
• BLAST is now considered a crucial and widely used
tool in the field of bioinformatics.
• It has played a vital role in numerous research studies
and has paved the way for the development of other
sequence comparison tools.

Three heuristic layers: seeding, extension, and
evaluation
•Seeding – identify where to start alignment
•Extension – extending alignment from seeds
•Evaluation – Determine which alignments are
statistically significant

Characteristics of BLAST
•Several key features of BLAST make it a widely used tool in
bioinformatics. Some of these are:
•BLAST is fast and efficient, making it possible to handle large
databases of sequences.
•It is a flexible and versatile tool as it can be used to search for
similarities in both nucleotide and protein sequences.
•It is highly sensitive which allows the identification of even small
similarities between sequences.
•It aims to identify regions of local similarity between the query
sequence and the database sequence, rather than attempting to align the
entire sequences.
•It has a user-friendly interface that makes it easy to input query
sequences and interpret the results.

Applications of BLAST
•BLAST has a wide range of applications. Some of the
most common applications are:
•BLAST can be used to identify unknown sequences by
comparing them with known sequences in a database
which helps in predicting the functions of proteins or
genes.
•BLAST can also be used in phylogenetic analysis which
is important for understanding the evolutionary
relationships between different species.
•BLAST can also be used to identify functionally
conserved domains within proteins which is important for
predicting the functions of proteins.

Applications include
• identifying orthologs and paralogs (Orthologs
are genes in different species evolved from a common
ancestral gene. Paralogs are gene copies created by a
duplication event within the same genome. While
orthologous genes kept the same function, paralogous genes
often develop different functions due to missing selective
pressure on one copy of the duplicated gene)
• discovering new genes or proteins
• discovering variants of genes or proteins
• investigating expressed sequence tags (ESTs)
• exploring protein structure and function

E values
E (expect) value: Expectation value. The number of chance
alignments with scores equivalent to or better than S that are
expected to occur in a database search by chance. The lower
the E value, the more significant the score.
– The E value decreases exponentially as the Score (S) that is assigned to
a match between two sequences increases.
– The E value depends on the size of database and the scoring system in
use.
– When the Expect value threshold is increased from the default value of
10, more hits can be reported.
Bit score: The bit score is calculated from the raw score by
normalizing with the statistical variables that define a given
scoring system. Therefore, bit scores from different
alignments, even those employing different scoring matrices
can be compared.
E = kmNe-λs
m= query size N= database size
k= minor constant λ= constant to adjust fro scoring matrix
S= score of High-scoring segment pair (HSP)

Four components to a BLAST search
Choose the sequence (query)
1.Select the BLAST program
2.Choose the database to search
3.Choose optional parameters
4.Then click “BLAST”

Example of the FASTA format for a BLAST query

Step 2: Choose the BLAST program

Choose the BLAST program
Program Input Database
1
blastn DNA DNA
1
blastp protein protein
6
blastx DNA protein
6
tblastn protein DNA
36
tblastx DNA DNA

Step 3: choose the database
nr = non-redundant (most general database)
dbest = database of expressed sequence tags
dbsts = database of sequence tag sites
gss = genomic survey sequences
protein databases
nucleotide databases

Step 4a: Select optional search parameters
Entrez!
algorithm
organism

Step 4a: optional blastp search parameters
Filter, mask
Scoring matrix
Word size
Expect

Step 4a: optional blastn search parameters
Filter, mask
Match/mismatch scores
Word size
Expect

Basic Local Alignment Tool (BLAST) bioinformatics

More Related Content

Similar to Basic Local Alignment Tool (BLAST) bioinformatics

More from FarihaAbdulRasheed

Recently uploaded

Basic Local Alignment Tool (BLAST) bioinformatics