SlideShare a Scribd company logo
Asian University For Women
BINF 2000: Introduction to
Bioinformatics (Lab#03)
Basic BLAST (BLASTn)
Fall 2022
This content is prepared by A.K.M Moniruzzaman Mollah (Ph.D., Head, Science & Math Program, Associate Professor
of Life Sciences), Syed Mohammad Lokman (Adjunct Faculty, Bioinformatics & Environmental Sciences), Nabila
Ishaque (Adjunct Faculty, Life Sciences), and Fatema-Tuz-Zohora (Adjunct Faculty, Bioinformatics & Environmental
Sciences), Asian University For Women, Chittagong, Bangladesh.
References are cited in between the contents.
Lab#3: Basic BLAST (blastn)
One of the most important bioinformatic strategies used for the functional annotation of
genes and genomes is to predict the function of uncharacterized genes or proteins
based on their similarity to sequences with better functional annotations. BLAST is
perhaps the single most important tool for finding database sequences that are
similar to a query sequence of interest.
The Basic Local Alignment and Search Tool (BLAST; Altschul et al., 1997) is a very
powerful approach to identifying database sequences that share local similarity to a
query sequence (see below for definitions). There is a very important chain of
assumptions used in biological research that is generally followed when using BLAST:
● Homologous genes share sequence similarity
○ Orthologous genes have the highest similarity among multiple species
■ Orthologous genes most likely have similar functions
● Consequently, sequences that are most similar between
multiple species share similar functions.
Note, it is very important to understand that these are only assumptions, and there are
many reasons and instances where these assumptions prove to be false. Nevertheless,
they are a reasonable starting place.
3.1. Definitions:
● Similar sequences – sequences that share a significant number of residues
(nucleotides or amino acids). Sequences can be similar due to homology or
simply by chance. The higher the similarity between sequences, the more likely
they are to be homologous.
● Homologous sequences – sequences that are related through common
ancestry. Homology is qualitative – two sequences either are, or are not related
through common ancestry. Homologous sequences can vary greatly in their level
of similarity – from 100% to 0%.
● Orthologous sequences – sequences that are related through a past speciation
event. Orthologous sequences are assumed to share common functions.
● Paralogous sequences – sequences that are related through a past gene
duplication event. Genes often diverge in function after duplicating; therefore,
paralogous sequences are not assumed to share a common function.
● Query sequence – your sequence; the sequence you are interested in finding
more about.
● High Scoring Segment Pair (HSP) – ‘hits’ to the database. A subsequence
match between your query sequence and a database sequence returned by
BLAST.
● Local alignment – a sequence alignment that extends only across part of the
sequence.
● Global alignment – a sequence alignment that extends across the entire
sequence (from end to end).
● HSP: high scoring sequence pair.
● FASTA format: FASTA format is used to represent either nucleotide or peptide
sequences. The first line is a comment line, beginning with “>” and describing the
sequence. All the following lines are the sequence, in plain text.
○ Example DNA sequence in FASTA format:
■ >gi|23423|ref|NM_23542.0| Homo sapiens protein
ATGAATCGATACGATAGCTAGCTATCGATGCAGATCAGAGAGGG
GCTTTAGCTAGCTAAGCTAG
○ Example protein sequence in FASTA format:
■ >MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken
ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTE
AELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFR
VFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQ
VNYEEFVQMMTAK*
3.2. How does BLAST work?
BLAST identifies homologous sequences using a heuristic method which initially finds
short matches between two sequences; thus, the method does not take the entire
sequence space into account. After the initial match, BLAST attempts to start local
alignments from these initial matches. Four major steps of BLAST are:
1. Seeding:
a. Break sequence(s) into words, search for similar words in
database
b. Calculate E-value to select HSPs
c. Pass selected number of HSPs to extension step
2. Extension: Drawing the lines and generate alignment
3. Evaluation: Figuring out which lines are best (Perform
additional evaluations (%identity, %positives, %gaps)
4. Output: Writing the results out to disk
3.3. Types of BLAST
Method Option Query Type DB Type Comparison
Natural
BLAST
BLASTn Nucleotide Nucleotide Nucleotide-Nucleotide
BLASTp Protein Protein Protein-Protein
Translated
BLAST
BLASTx Nucleotide Protein Protein-Protein
tBLASTn Protein Nucleotide Protein-Protein
3.4. Understanding BLAST results:
The amount of information on the BLAST website is a bit overwhelming — even for the
scientists who use it on a frequent basis! You are not expected to know every detail of
the BLAST program. BLAST results have the following fields:
E value: The E value (expected value) is a number that describes how many times you
would expect a match by chance in a database of that size. The lower the E value is,
the more significant the match.
• E-value < 10e-100 Identical sequences. You will get long alignments across the
entire query and hit sequence.
• 10e-50 < E-value < 10e-100 Almost identical sequences. A long stretch of the
query protein is matched to the database.
• 10e-10 < E-value < 10e-50 Closely related sequences.
• 1 < E-value < 10e-6 Could be a true homologue but it is a gray area.
• E-value > 1 Proteins are most likely not related
• E-value > 10 Hits are most likely junk unless the query sequence is very short.
Percent Identity: The percent identity is a number that describes how similar the query
sequence is to the target sequence (how many characters in each sequence are
identical). The higher the percent identity is, the more significant the match.
Query Cover: The query cover is a number that describes how much of the query
sequence is covered by the target sequence. If the target sequence in the database
spans the whole query sequence, then the query cover is 100%. This tells us how long
the sequences are, relative to each other.
3.4. Examples of BLAST usage
BLAST can be used for a lot of different purposes. A few of them are mentioned below.
• Looking for species. If you are sequencing DNA from unknown species, BLAST may
help identify the correct species or homologous species.
• Looking for domains. If you BLAST a protein sequence (or a translated nucleotide
sequence) BLAST will look for known domains in the query sequence.
• Looking at phylogeny. You can use the BLAST web pages to generate a
phylogenetic tree of the BLAST result.
• Mapping DNA to a known chromosome. If you are sequencing a gene from a known
species but have no idea of the chromosome location, BLAST can help you. BLAST will
show you the position of the query sequence in relation to the hit sequences.
• Annotations. BLAST can also be used to map annotations from one organism to
another or look for common genes in two related species.
Exercise 3. 1: Identify unknown sequences
The Sequences:
>Unknown sequence 1
AAATGAGTTAATAGAATCTTTACAAATAAGAATATACACTTCTGCTTAGGATGATAATTG
GAGGCAAGTGAATCCTGAGCGTGATTTGATAATGACCTAATAATGATGGGTTTTATTT
CCAGACTTCACTTCTAATGGTGATTATGGGAGAACTGGAGCCTTCAGAGGGTAAAAT
TAAGCACAGTGGAAGAATTTCATTCTGTTCTCAGTTTTCCTGGATTATGCCTGGCAC
CATTAAAGAAAATATCATCTTTGGTGTTTCCTATGATGAATATAGATACAGAAGCGTCA
TCAAAGCATGCCAACTAGAAGAGGTAAGAAACTATGTGAAAACTTTTTGATTATGCAT
ATGAACCCTTCACACTACCCAAATTATATATTTGGCTCCATATTCAATCGGTTAGTCTA
CATATATTTATGTTTCCTCTATGGGTAAGCTACTGTGAATGGATCAATTAATAAAACAC
ATGACCTATGCTTTAAGAAGCTTGCAAACACATGAA
>Unknown sequence 2
AAATGAGTTAATAGAATCTTTACAAATAAGAATATACACTTCTGCTTAGGATGATAATTG
GAGGCAAGTGAATCCTGAGCGTGATTTGATAATGACCTAATAATGATGGGTTTTATTT
CCAGACTTCACTTCTAATGGTGATTATGGGAGAACTGGAGCCTTCAGAGGGTAAAAT
TAAGCACAGTGGAAGAATTTCATTCTGTTCTCAGTTTTCCTGGATTATGCCTGGCAC
CATTAAAGAAAATATCATTGGTGTTTCCTATGATGAATATAGATACAGAAGCGTCATCA
AAGCATGCCAACTAGAAGAGGTAAGAAACTATGTGAAAACTTTTTGATTATGCATATG
AACCCTTCACACTACCCAAATTATATATTTGGCTCCATATTCAATCGGTTAGTCTACAT
ATATTTATGTTTCCTCTATGGGTAAGCTACTGTGAATGGATCAATTAATAAAACACATG
ACCTATGCTTTAAGAAGCTTGCAAACACATGAA
Procedure:
1. Navigate to the main BLAST page (https://blast.ncbi.nlm.nih.gov/Blast.cgi)
2. Select the appropriate type of BLAST for your sequence
3. Paste the “unknown sequence 1” into the box
4. Click on the “BLAST” button and wait for the results. BLAST is usually fairly quick
for short sequences, but should still take a few seconds.
5. Once the results are displayed, notice there are three main headings: Graphic
Summary, Descriptions, and Alignments (these may be expanded so you’ll
have to scroll down).
6. Use these results to answer the questions below.
7. Repeat steps 1-6 for the other “unknown sequence 2”.
Questions:
Q1. In the Descriptions section, look at the top result, which should be the result with
the highest score. Write down information about the best match:
● Description (no need to write the whole thing)
● E value
● Identity
● Query cover
Q2. Now scroll down to the Alignments heading. Look at the top result, which should be
the same one. Look at the alignment between your query and the reference. Do you see
any mismatches?
Q3. How can you judge whether this is a good match?
Q4. What is this gene? Google the name of the gene and write down something
significant you learned about it.
Q5. How are the “unknown sequence 1” and “unknown sequence 1” related?
Exercise 3.2: Finding Homologes
1. Visit the GenBank page of your known sequence NM_001336190.1
(https://www.ncbi.nlm.nih.gov/nuccore/NM_001336190.1)
2. Click on “Run BLAST” from the right panel (in the “Analyze This Sequence” part of
the webpage.)
3. On the BLAST page, note that under the Enter Query Sequence section, the NCBI
system has automatically entered the accession number (but you can also enter a GI
number, or FASTA sequence)
4. We want to query the full NCBI database; the NCBI linking system has automatically
changed the default Database (which is Human) to Other and Nucleotide collection
(nr/nt) because our sequence is non-human. The nr database is the non-redundant
collection of sequences in GenBank.
5. In the “program selection”, Change the Program Selected / Optimized for to
“Somewhat similar sequences (blastn).”
6. Open the Algorithm Parameters near the bottom.
7. Check the box next to “Show results in a new window” near the BLAST button.
Now (finally) click the BLAST button.
Questions:
Q6. When would you want to use megaBLAST? What about discontiguous
megaBLAST? (try each to see how your results differ)
Q7. What is the Expect threshold?
Q8. What would happen if you decreased it? Increased it?
Q9. What would be the effect of increasing the Word size?
Q10. Why is there a Low complexity regions filter? Should we keep it on?
8. Analyze the Results Page:
The results page is broken up into sections. At the very top is the job summary, which
simply shows details about your query and the database searched. You can find more
details about your search by clicking Search Summary.
Questions:
Q11. How many sequences are in the nr database?
Q12. What sequences are not included in the nr database? (Trick question: this
information is actually available by clicking on the question mark beside the
Database option on the input page!)
9. Explore the Graphic Summary tab. Scroll your mouse over the coloured bars.
Questions:
Q13. What do the coloured bars mean?
Q14. How does the color code work?
Q15. What information is displayed when you hover on an entry?
Q16. What do you notice about the significance values as you move down the graphical
summary?
Q17. What is the genus and species of the top (best) hit?
Q18. What happens if you click on one of the entries?
10. The Descriptions tab lists the following:
● Description [hyperlinked to corresponding Alignment(s) in Alignments section]
● Max Score – the alignment bit score
● Total Score – another alignment bit score which may differ from the Max Score if
your query matched a single database entry in multiple regions.
● Query Coverage – what percent of the query had similarity to the database hit.
● E-value – probably the best measure of hit quality. Smaller numbers mean better
● hits, with 0.0 being the best value possible.
● Identity – the highest identity found between query and HSP.
● Accession – linked to the indicated sequence at NCBI
Questions:
Q19. How many sequence matches are listed for this query sequence? How are they
ordered? (you can sort these segments in other ways, like by identity, score, and
query start position.)
Q20. What happens if you click the Accession hotlink?
11. Finally we can explore the actual HSP Alignments in the Alignments tab.
Compare the information presented for the first HSP alignment to the first entry in the
graphical summary and HSP summary. As you scroll down the alignments, you will see
the alignment quality drop – that is, the e-value increases.
Questions:
Q22. What do the vertical bars ( | )represent between the Query and the Sbjct
(database sequence)?
Q23. What does Strand=Plus/Plus, Strand=Plus/Minus mean? Hint: are genes
always in the same direction on a piece of chromosomal DNA?
12. Go back to the top of the page and click Formatting options. Change the
Alignment View to Query-anchored with dots for identities. Click Reformat and score
down to the HSP alignment section.
Questions:
Q24. Describe the difference between this format and the previous format. Can you
imagine cases where the different formats might be most useful?
Q25. Play with these format options to get a feel for what they mean.

More Related Content

Similar to Basic BLAST (BLASTn)

Blast and fasta
Blast and fastaBlast and fasta
Blast and fastaALLIENU
 
Database similarity searching blast and fasta
Database similarity searching blast and fastaDatabase similarity searching blast and fasta
Database similarity searching blast and fastaSwathi764350
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...journal ijrtem
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...IJRTEMJOURNAL
 
Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)AnkitTiwari354
 
Blast gp assignment
Blast  gp assignmentBlast  gp assignment
Blast gp assignmentbarathvaj
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRONPrabin Shakya
 
Sequencedatabases
SequencedatabasesSequencedatabases
SequencedatabasesAbhik Seal
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriMonica Munoz-Torres
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)Sobia
 
blast presentation beevragh muneer.pptx
blast presentation  beevragh muneer.pptxblast presentation  beevragh muneer.pptx
blast presentation beevragh muneer.pptxhome
 
Blast bioinformatics
Blast bioinformaticsBlast bioinformatics
Blast bioinformaticsatmapandey
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxRanjan Jyoti Sarma
 
Softwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisSoftwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisPrasanthperceptron
 

Similar to Basic BLAST (BLASTn) (20)

Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
Blast
BlastBlast
Blast
 
Database similarity searching blast and fasta
Database similarity searching blast and fastaDatabase similarity searching blast and fasta
Database similarity searching blast and fasta
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
BLAST
BLASTBLAST
BLAST
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
BLAST
BLASTBLAST
BLAST
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)
 
Blast gp assignment
Blast  gp assignmentBlast  gp assignment
Blast gp assignment
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRON
 
Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citri
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)
 
blast presentation beevragh muneer.pptx
blast presentation  beevragh muneer.pptxblast presentation  beevragh muneer.pptx
blast presentation beevragh muneer.pptx
 
Blast bioinformatics
Blast bioinformaticsBlast bioinformatics
Blast bioinformatics
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptx
 
Sequence database
Sequence databaseSequence database
Sequence database
 
Softwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisSoftwares For Phylogentic Analysis
Softwares For Phylogentic Analysis
 

More from Syed Lokman

Water Analysis: Part 2
Water Analysis: Part 2Water Analysis: Part 2
Water Analysis: Part 2Syed Lokman
 
Water Analysis: Part 1
Water Analysis: Part 1Water Analysis: Part 1
Water Analysis: Part 1Syed Lokman
 
Ecological Succession SlideShare
Ecological Succession SlideShareEcological Succession SlideShare
Ecological Succession SlideShareSyed Lokman
 
Intro to Ecology
Intro to EcologyIntro to Ecology
Intro to EcologySyed Lokman
 
Soil Analysis Worksheet
Soil Analysis WorksheetSoil Analysis Worksheet
Soil Analysis WorksheetSyed Lokman
 
Ecological Succession: Part 1
Ecological Succession: Part 1Ecological Succession: Part 1
Ecological Succession: Part 1Syed Lokman
 
Predator Prey Interaction
Predator Prey InteractionPredator Prey Interaction
Predator Prey InteractionSyed Lokman
 
Plant Competition
Plant CompetitionPlant Competition
Plant CompetitionSyed Lokman
 
predator prey: Worksheet
predator prey: Worksheetpredator prey: Worksheet
predator prey: WorksheetSyed Lokman
 
Natural Selection Analysis: Part1
Natural Selection Analysis: Part1Natural Selection Analysis: Part1
Natural Selection Analysis: Part1Syed Lokman
 
Protein Analysis
Protein AnalysisProtein Analysis
Protein AnalysisSyed Lokman
 
Phylogenetic Tree Construction
Phylogenetic Tree ConstructionPhylogenetic Tree Construction
Phylogenetic Tree ConstructionSyed Lokman
 
Natural Selection Analysis: Part2
Natural Selection Analysis: Part2Natural Selection Analysis: Part2
Natural Selection Analysis: Part2Syed Lokman
 
Global Alignment
Global AlignmentGlobal Alignment
Global AlignmentSyed Lokman
 
Review Class on Introduction to Bioinformatics
Review Class on Introduction to BioinformaticsReview Class on Introduction to Bioinformatics
Review Class on Introduction to BioinformaticsSyed Lokman
 

More from Syed Lokman (20)

Water Analysis: Part 2
Water Analysis: Part 2Water Analysis: Part 2
Water Analysis: Part 2
 
Water Analysis: Part 1
Water Analysis: Part 1Water Analysis: Part 1
Water Analysis: Part 1
 
Soil Analysis
Soil AnalysisSoil Analysis
Soil Analysis
 
Ecological Succession SlideShare
Ecological Succession SlideShareEcological Succession SlideShare
Ecological Succession SlideShare
 
Intro to Ecology
Intro to EcologyIntro to Ecology
Intro to Ecology
 
Soil Analysis Worksheet
Soil Analysis WorksheetSoil Analysis Worksheet
Soil Analysis Worksheet
 
Lab Safety
Lab SafetyLab Safety
Lab Safety
 
Sampling Method
Sampling MethodSampling Method
Sampling Method
 
Ecological Succession: Part 1
Ecological Succession: Part 1Ecological Succession: Part 1
Ecological Succession: Part 1
 
Predator Prey Interaction
Predator Prey InteractionPredator Prey Interaction
Predator Prey Interaction
 
Plant Competition
Plant CompetitionPlant Competition
Plant Competition
 
predator prey: Worksheet
predator prey: Worksheetpredator prey: Worksheet
predator prey: Worksheet
 
Natural Selection Analysis: Part1
Natural Selection Analysis: Part1Natural Selection Analysis: Part1
Natural Selection Analysis: Part1
 
Local Alignment
Local AlignmentLocal Alignment
Local Alignment
 
Protein Analysis
Protein AnalysisProtein Analysis
Protein Analysis
 
Phylogenetics
PhylogeneticsPhylogenetics
Phylogenetics
 
Phylogenetic Tree Construction
Phylogenetic Tree ConstructionPhylogenetic Tree Construction
Phylogenetic Tree Construction
 
Natural Selection Analysis: Part2
Natural Selection Analysis: Part2Natural Selection Analysis: Part2
Natural Selection Analysis: Part2
 
Global Alignment
Global AlignmentGlobal Alignment
Global Alignment
 
Review Class on Introduction to Bioinformatics
Review Class on Introduction to BioinformaticsReview Class on Introduction to Bioinformatics
Review Class on Introduction to Bioinformatics
 

Recently uploaded

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasiemaillard
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsCol Mukteshwar Prasad
 
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptxJose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptxricssacare
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxJisc
 
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...Denish Jangid
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptxJosvitaDsouza2
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePedroFerreira53928
 
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...Nguyen Thanh Tu Collection
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfjoachimlavalley1
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
 
Benefits and Challenges of Using Open Educational Resources
Benefits and Challenges of Using Open Educational ResourcesBenefits and Challenges of Using Open Educational Resources
Benefits and Challenges of Using Open Educational Resourcesdimpy50
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxbennyroshan06
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxRaedMohamed3
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfTamralipta Mahavidyalaya
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleCeline George
 
Salient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptxSalient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptxakshayaramakrishnan21
 
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptxSolid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptxDenish Jangid
 
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdfDanh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdfQucHHunhnh
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismDeeptiGupta154
 

Recently uploaded (20)

Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptxJose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
Benefits and Challenges of Using Open Educational Resources
Benefits and Challenges of Using Open Educational ResourcesBenefits and Challenges of Using Open Educational Resources
Benefits and Challenges of Using Open Educational Resources
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
Salient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptxSalient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptx
 
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptxSolid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
 
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdfDanh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 

Basic BLAST (BLASTn)

  • 1. Asian University For Women BINF 2000: Introduction to Bioinformatics (Lab#03) Basic BLAST (BLASTn) Fall 2022 This content is prepared by A.K.M Moniruzzaman Mollah (Ph.D., Head, Science & Math Program, Associate Professor of Life Sciences), Syed Mohammad Lokman (Adjunct Faculty, Bioinformatics & Environmental Sciences), Nabila Ishaque (Adjunct Faculty, Life Sciences), and Fatema-Tuz-Zohora (Adjunct Faculty, Bioinformatics & Environmental Sciences), Asian University For Women, Chittagong, Bangladesh. References are cited in between the contents.
  • 2. Lab#3: Basic BLAST (blastn) One of the most important bioinformatic strategies used for the functional annotation of genes and genomes is to predict the function of uncharacterized genes or proteins based on their similarity to sequences with better functional annotations. BLAST is perhaps the single most important tool for finding database sequences that are similar to a query sequence of interest. The Basic Local Alignment and Search Tool (BLAST; Altschul et al., 1997) is a very powerful approach to identifying database sequences that share local similarity to a query sequence (see below for definitions). There is a very important chain of assumptions used in biological research that is generally followed when using BLAST: ● Homologous genes share sequence similarity ○ Orthologous genes have the highest similarity among multiple species ■ Orthologous genes most likely have similar functions ● Consequently, sequences that are most similar between multiple species share similar functions. Note, it is very important to understand that these are only assumptions, and there are many reasons and instances where these assumptions prove to be false. Nevertheless, they are a reasonable starting place. 3.1. Definitions: ● Similar sequences – sequences that share a significant number of residues (nucleotides or amino acids). Sequences can be similar due to homology or simply by chance. The higher the similarity between sequences, the more likely they are to be homologous. ● Homologous sequences – sequences that are related through common ancestry. Homology is qualitative – two sequences either are, or are not related through common ancestry. Homologous sequences can vary greatly in their level of similarity – from 100% to 0%.
  • 3. ● Orthologous sequences – sequences that are related through a past speciation event. Orthologous sequences are assumed to share common functions. ● Paralogous sequences – sequences that are related through a past gene duplication event. Genes often diverge in function after duplicating; therefore, paralogous sequences are not assumed to share a common function. ● Query sequence – your sequence; the sequence you are interested in finding more about. ● High Scoring Segment Pair (HSP) – ‘hits’ to the database. A subsequence match between your query sequence and a database sequence returned by BLAST. ● Local alignment – a sequence alignment that extends only across part of the sequence. ● Global alignment – a sequence alignment that extends across the entire sequence (from end to end). ● HSP: high scoring sequence pair. ● FASTA format: FASTA format is used to represent either nucleotide or peptide sequences. The first line is a comment line, beginning with “>” and describing the sequence. All the following lines are the sequence, in plain text. ○ Example DNA sequence in FASTA format: ■ >gi|23423|ref|NM_23542.0| Homo sapiens protein ATGAATCGATACGATAGCTAGCTATCGATGCAGATCAGAGAGGG GCTTTAGCTAGCTAAGCTAG
  • 4. ○ Example protein sequence in FASTA format: ■ >MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTE AELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFR VFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQ VNYEEFVQMMTAK* 3.2. How does BLAST work? BLAST identifies homologous sequences using a heuristic method which initially finds short matches between two sequences; thus, the method does not take the entire sequence space into account. After the initial match, BLAST attempts to start local alignments from these initial matches. Four major steps of BLAST are: 1. Seeding: a. Break sequence(s) into words, search for similar words in database b. Calculate E-value to select HSPs c. Pass selected number of HSPs to extension step 2. Extension: Drawing the lines and generate alignment 3. Evaluation: Figuring out which lines are best (Perform additional evaluations (%identity, %positives, %gaps) 4. Output: Writing the results out to disk 3.3. Types of BLAST Method Option Query Type DB Type Comparison Natural BLAST BLASTn Nucleotide Nucleotide Nucleotide-Nucleotide BLASTp Protein Protein Protein-Protein Translated BLAST BLASTx Nucleotide Protein Protein-Protein tBLASTn Protein Nucleotide Protein-Protein
  • 5. 3.4. Understanding BLAST results: The amount of information on the BLAST website is a bit overwhelming — even for the scientists who use it on a frequent basis! You are not expected to know every detail of the BLAST program. BLAST results have the following fields: E value: The E value (expected value) is a number that describes how many times you would expect a match by chance in a database of that size. The lower the E value is, the more significant the match. • E-value < 10e-100 Identical sequences. You will get long alignments across the entire query and hit sequence. • 10e-50 < E-value < 10e-100 Almost identical sequences. A long stretch of the query protein is matched to the database. • 10e-10 < E-value < 10e-50 Closely related sequences. • 1 < E-value < 10e-6 Could be a true homologue but it is a gray area. • E-value > 1 Proteins are most likely not related • E-value > 10 Hits are most likely junk unless the query sequence is very short. Percent Identity: The percent identity is a number that describes how similar the query sequence is to the target sequence (how many characters in each sequence are identical). The higher the percent identity is, the more significant the match. Query Cover: The query cover is a number that describes how much of the query sequence is covered by the target sequence. If the target sequence in the database spans the whole query sequence, then the query cover is 100%. This tells us how long the sequences are, relative to each other. 3.4. Examples of BLAST usage BLAST can be used for a lot of different purposes. A few of them are mentioned below. • Looking for species. If you are sequencing DNA from unknown species, BLAST may help identify the correct species or homologous species. • Looking for domains. If you BLAST a protein sequence (or a translated nucleotide sequence) BLAST will look for known domains in the query sequence.
  • 6. • Looking at phylogeny. You can use the BLAST web pages to generate a phylogenetic tree of the BLAST result. • Mapping DNA to a known chromosome. If you are sequencing a gene from a known species but have no idea of the chromosome location, BLAST can help you. BLAST will show you the position of the query sequence in relation to the hit sequences. • Annotations. BLAST can also be used to map annotations from one organism to another or look for common genes in two related species. Exercise 3. 1: Identify unknown sequences The Sequences: >Unknown sequence 1 AAATGAGTTAATAGAATCTTTACAAATAAGAATATACACTTCTGCTTAGGATGATAATTG GAGGCAAGTGAATCCTGAGCGTGATTTGATAATGACCTAATAATGATGGGTTTTATTT CCAGACTTCACTTCTAATGGTGATTATGGGAGAACTGGAGCCTTCAGAGGGTAAAAT TAAGCACAGTGGAAGAATTTCATTCTGTTCTCAGTTTTCCTGGATTATGCCTGGCAC CATTAAAGAAAATATCATCTTTGGTGTTTCCTATGATGAATATAGATACAGAAGCGTCA TCAAAGCATGCCAACTAGAAGAGGTAAGAAACTATGTGAAAACTTTTTGATTATGCAT ATGAACCCTTCACACTACCCAAATTATATATTTGGCTCCATATTCAATCGGTTAGTCTA CATATATTTATGTTTCCTCTATGGGTAAGCTACTGTGAATGGATCAATTAATAAAACAC ATGACCTATGCTTTAAGAAGCTTGCAAACACATGAA >Unknown sequence 2 AAATGAGTTAATAGAATCTTTACAAATAAGAATATACACTTCTGCTTAGGATGATAATTG GAGGCAAGTGAATCCTGAGCGTGATTTGATAATGACCTAATAATGATGGGTTTTATTT CCAGACTTCACTTCTAATGGTGATTATGGGAGAACTGGAGCCTTCAGAGGGTAAAAT TAAGCACAGTGGAAGAATTTCATTCTGTTCTCAGTTTTCCTGGATTATGCCTGGCAC CATTAAAGAAAATATCATTGGTGTTTCCTATGATGAATATAGATACAGAAGCGTCATCA AAGCATGCCAACTAGAAGAGGTAAGAAACTATGTGAAAACTTTTTGATTATGCATATG AACCCTTCACACTACCCAAATTATATATTTGGCTCCATATTCAATCGGTTAGTCTACAT ATATTTATGTTTCCTCTATGGGTAAGCTACTGTGAATGGATCAATTAATAAAACACATG ACCTATGCTTTAAGAAGCTTGCAAACACATGAA
  • 7. Procedure: 1. Navigate to the main BLAST page (https://blast.ncbi.nlm.nih.gov/Blast.cgi) 2. Select the appropriate type of BLAST for your sequence 3. Paste the “unknown sequence 1” into the box 4. Click on the “BLAST” button and wait for the results. BLAST is usually fairly quick for short sequences, but should still take a few seconds. 5. Once the results are displayed, notice there are three main headings: Graphic Summary, Descriptions, and Alignments (these may be expanded so you’ll have to scroll down). 6. Use these results to answer the questions below. 7. Repeat steps 1-6 for the other “unknown sequence 2”. Questions: Q1. In the Descriptions section, look at the top result, which should be the result with the highest score. Write down information about the best match: ● Description (no need to write the whole thing) ● E value ● Identity ● Query cover Q2. Now scroll down to the Alignments heading. Look at the top result, which should be the same one. Look at the alignment between your query and the reference. Do you see any mismatches? Q3. How can you judge whether this is a good match? Q4. What is this gene? Google the name of the gene and write down something significant you learned about it. Q5. How are the “unknown sequence 1” and “unknown sequence 1” related?
  • 8. Exercise 3.2: Finding Homologes 1. Visit the GenBank page of your known sequence NM_001336190.1 (https://www.ncbi.nlm.nih.gov/nuccore/NM_001336190.1) 2. Click on “Run BLAST” from the right panel (in the “Analyze This Sequence” part of the webpage.) 3. On the BLAST page, note that under the Enter Query Sequence section, the NCBI system has automatically entered the accession number (but you can also enter a GI number, or FASTA sequence) 4. We want to query the full NCBI database; the NCBI linking system has automatically changed the default Database (which is Human) to Other and Nucleotide collection (nr/nt) because our sequence is non-human. The nr database is the non-redundant collection of sequences in GenBank. 5. In the “program selection”, Change the Program Selected / Optimized for to “Somewhat similar sequences (blastn).” 6. Open the Algorithm Parameters near the bottom. 7. Check the box next to “Show results in a new window” near the BLAST button. Now (finally) click the BLAST button. Questions: Q6. When would you want to use megaBLAST? What about discontiguous megaBLAST? (try each to see how your results differ) Q7. What is the Expect threshold? Q8. What would happen if you decreased it? Increased it? Q9. What would be the effect of increasing the Word size? Q10. Why is there a Low complexity regions filter? Should we keep it on?
  • 9. 8. Analyze the Results Page: The results page is broken up into sections. At the very top is the job summary, which simply shows details about your query and the database searched. You can find more details about your search by clicking Search Summary. Questions: Q11. How many sequences are in the nr database? Q12. What sequences are not included in the nr database? (Trick question: this information is actually available by clicking on the question mark beside the Database option on the input page!) 9. Explore the Graphic Summary tab. Scroll your mouse over the coloured bars. Questions: Q13. What do the coloured bars mean? Q14. How does the color code work? Q15. What information is displayed when you hover on an entry? Q16. What do you notice about the significance values as you move down the graphical summary? Q17. What is the genus and species of the top (best) hit? Q18. What happens if you click on one of the entries? 10. The Descriptions tab lists the following: ● Description [hyperlinked to corresponding Alignment(s) in Alignments section] ● Max Score – the alignment bit score ● Total Score – another alignment bit score which may differ from the Max Score if your query matched a single database entry in multiple regions. ● Query Coverage – what percent of the query had similarity to the database hit. ● E-value – probably the best measure of hit quality. Smaller numbers mean better ● hits, with 0.0 being the best value possible. ● Identity – the highest identity found between query and HSP. ● Accession – linked to the indicated sequence at NCBI
  • 10. Questions: Q19. How many sequence matches are listed for this query sequence? How are they ordered? (you can sort these segments in other ways, like by identity, score, and query start position.) Q20. What happens if you click the Accession hotlink? 11. Finally we can explore the actual HSP Alignments in the Alignments tab. Compare the information presented for the first HSP alignment to the first entry in the graphical summary and HSP summary. As you scroll down the alignments, you will see the alignment quality drop – that is, the e-value increases. Questions: Q22. What do the vertical bars ( | )represent between the Query and the Sbjct (database sequence)? Q23. What does Strand=Plus/Plus, Strand=Plus/Minus mean? Hint: are genes always in the same direction on a piece of chromosomal DNA? 12. Go back to the top of the page and click Formatting options. Change the Alignment View to Query-anchored with dots for identities. Click Reformat and score down to the HSP alignment section. Questions: Q24. Describe the difference between this format and the previous format. Can you imagine cases where the different formats might be most useful? Q25. Play with these format options to get a feel for what they mean.