BLAST
•Basic Local Alignment Search Tool
•Algorithm for comparing a given sequence against sequences in a
database
•A match between two sequences is an alignment.
•Many BLAST databases and web services available.
•Set of programs that search sequence databases for statistically
significant similarities
•Heuristic method for local alignment
•Based on the same assumption as FASTA that good alignments
contain short lengths of exact matches
• BLAST is a widely used bioinformatics program that
was first introduced by Stephen Altschul and Samuel
Karlin in 1990 and has since become one of the most
popular tools for sequence similarity search.
• BLAST is a powerful tool for analyzing biological
sequence data. Since the initial release of BLAST in
1990, it has undergone continuous updates to improve
its speed and accuracy.
• BLAST is now considered a crucial and widely used
tool in the field of bioinformatics.
• It has played a vital role in numerous research studies
and has paved the way for the development of other
sequence comparison tools.
Three heuristic layers: seeding, extension, and
evaluation
•Seeding – identify where to start alignment
•Extension – extending alignment from seeds
•Evaluation – Determine which alignments are
statistically significant
Characteristics of BLAST
•Several key features of BLAST make it a widely used tool in
bioinformatics. Some of these are:
•BLAST is fast and efficient, making it possible to handle large
databases of sequences.
•It is a flexible and versatile tool as it can be used to search for
similarities in both nucleotide and protein sequences.
•It is highly sensitive which allows the identification of even small
similarities between sequences.
•It aims to identify regions of local similarity between the query
sequence and the database sequence, rather than attempting to align the
entire sequences.
•It has a user-friendly interface that makes it easy to input query
sequences and interpret the results.
Applications of BLAST
•BLAST has a wide range of applications. Some of the
most common applications are:
•BLAST can be used to identify unknown sequences by
comparing them with known sequences in a database
which helps in predicting the functions of proteins or
genes.
•BLAST can also be used in phylogenetic analysis which
is important for understanding the evolutionary
relationships between different species.
•BLAST can also be used to identify functionally
conserved domains within proteins which is important for
predicting the functions of proteins.
Applications include
• identifying orthologs and paralogs (Orthologs
are genes in different species evolved from a common
ancestral gene. Paralogs are gene copies created by a
duplication event within the same genome. While
orthologous genes kept the same function, paralogous genes
often develop different functions due to missing selective
pressure on one copy of the duplicated gene)
• discovering new genes or proteins
• discovering variants of genes or proteins
• investigating expressed sequence tags (ESTs)
• exploring protein structure and function
7
8
9
10
11
12
E values
E (expect) value: Expectation value. The number of chance
alignments with scores equivalent to or better than S that are
expected to occur in a database search by chance. The lower
the E value, the more significant the score.
– The E value decreases exponentially as the Score (S) that is assigned to
a match between two sequences increases.
– The E value depends on the size of database and the scoring system in
use.
– When the Expect value threshold is increased from the default value of
10, more hits can be reported.
Bit score: The bit score is calculated from the raw score by
normalizing with the statistical variables that define a given
scoring system. Therefore, bit scores from different
alignments, even those employing different scoring matrices
can be compared.
E = kmNe-λs
m= query size N= database size
k= minor constant λ= constant to adjust fro scoring matrix
S= score of High-scoring segment pair (HSP)
14
15
Four components to a BLAST search
Choose the sequence (query)
1.Select the BLAST program
2.Choose the database to search
3.Choose optional parameters
4.Then click “BLAST”
Example of the FASTA format for a BLAST query
Step 2: Choose the BLAST program
Choose the BLAST program
Program Input Database
1
blastn DNA DNA
1
blastp protein protein
6
blastx DNA protein
6
tblastn protein DNA
36
tblastx DNA DNA
Step 3: choose the database
nr = non-redundant (most general database)
dbest = database of expressed sequence tags
dbsts = database of sequence tag sites
gss = genomic survey sequences
protein databases
nucleotide databases
Step 4a: Select optional search parameters
Entrez!
algorithm
organism
Step 4a: optional blastp search parameters
Filter, mask
Scoring matrix
Word size
Expect
Step 4a: optional blastn search parameters
Filter, mask
Match/mismatch scores
Word size
Expect
25
26
27
28

Basic Local Alignment Tool (BLAST) bioinformatics

  • 1.
    BLAST •Basic Local AlignmentSearch Tool •Algorithm for comparing a given sequence against sequences in a database •A match between two sequences is an alignment. •Many BLAST databases and web services available. •Set of programs that search sequence databases for statistically significant similarities •Heuristic method for local alignment •Based on the same assumption as FASTA that good alignments contain short lengths of exact matches
  • 2.
    • BLAST isa widely used bioinformatics program that was first introduced by Stephen Altschul and Samuel Karlin in 1990 and has since become one of the most popular tools for sequence similarity search. • BLAST is a powerful tool for analyzing biological sequence data. Since the initial release of BLAST in 1990, it has undergone continuous updates to improve its speed and accuracy. • BLAST is now considered a crucial and widely used tool in the field of bioinformatics. • It has played a vital role in numerous research studies and has paved the way for the development of other sequence comparison tools.
  • 3.
    Three heuristic layers:seeding, extension, and evaluation •Seeding – identify where to start alignment •Extension – extending alignment from seeds •Evaluation – Determine which alignments are statistically significant
  • 4.
    Characteristics of BLAST •Severalkey features of BLAST make it a widely used tool in bioinformatics. Some of these are: •BLAST is fast and efficient, making it possible to handle large databases of sequences. •It is a flexible and versatile tool as it can be used to search for similarities in both nucleotide and protein sequences. •It is highly sensitive which allows the identification of even small similarities between sequences. •It aims to identify regions of local similarity between the query sequence and the database sequence, rather than attempting to align the entire sequences. •It has a user-friendly interface that makes it easy to input query sequences and interpret the results.
  • 5.
    Applications of BLAST •BLASThas a wide range of applications. Some of the most common applications are: •BLAST can be used to identify unknown sequences by comparing them with known sequences in a database which helps in predicting the functions of proteins or genes. •BLAST can also be used in phylogenetic analysis which is important for understanding the evolutionary relationships between different species. •BLAST can also be used to identify functionally conserved domains within proteins which is important for predicting the functions of proteins.
  • 6.
    Applications include • identifyingorthologs and paralogs (Orthologs are genes in different species evolved from a common ancestral gene. Paralogs are gene copies created by a duplication event within the same genome. While orthologous genes kept the same function, paralogous genes often develop different functions due to missing selective pressure on one copy of the duplicated gene) • discovering new genes or proteins • discovering variants of genes or proteins • investigating expressed sequence tags (ESTs) • exploring protein structure and function
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
    E values E (expect)value: Expectation value. The number of chance alignments with scores equivalent to or better than S that are expected to occur in a database search by chance. The lower the E value, the more significant the score. – The E value decreases exponentially as the Score (S) that is assigned to a match between two sequences increases. – The E value depends on the size of database and the scoring system in use. – When the Expect value threshold is increased from the default value of 10, more hits can be reported. Bit score: The bit score is calculated from the raw score by normalizing with the statistical variables that define a given scoring system. Therefore, bit scores from different alignments, even those employing different scoring matrices can be compared. E = kmNe-λs m= query size N= database size k= minor constant λ= constant to adjust fro scoring matrix S= score of High-scoring segment pair (HSP)
  • 14.
  • 15.
  • 16.
    Four components toa BLAST search Choose the sequence (query) 1.Select the BLAST program 2.Choose the database to search 3.Choose optional parameters 4.Then click “BLAST”
  • 18.
    Example of theFASTA format for a BLAST query
  • 19.
    Step 2: Choosethe BLAST program
  • 20.
    Choose the BLASTprogram Program Input Database 1 blastn DNA DNA 1 blastp protein protein 6 blastx DNA protein 6 tblastn protein DNA 36 tblastx DNA DNA
  • 21.
    Step 3: choosethe database nr = non-redundant (most general database) dbest = database of expressed sequence tags dbsts = database of sequence tag sites gss = genomic survey sequences protein databases nucleotide databases
  • 22.
    Step 4a: Selectoptional search parameters Entrez! algorithm organism
  • 23.
    Step 4a: optionalblastp search parameters Filter, mask Scoring matrix Word size Expect
  • 24.
    Step 4a: optionalblastn search parameters Filter, mask Match/mismatch scores Word size Expect
  • 25.
  • 26.
  • 27.
  • 28.