SlideShare a Scribd company logo
1 of 10
Asian University For Women
BINF 3016: Protein Modeling (Lab#02)
Advanced BLAST & Comparative
Genomics
Fall 2022
This content is prepared by Syed Mohammad Lokman (Adjunct Faculty, Bioinformatics & Environmental Sciences),
Asian University For Women, Chittagong, Bangladesh.
References are cited in between the contents.
ADVANCED BLAST
[Software needed: web access]
There are 3 sections to this lab: BlastP, PSI-Blast, and Comparative Genomics. Last time we
used BLAST to query a nucleotide sequence against the NCBI nr database. Now let’s search
using a protein sequence.
Lab 2.1: BLASTP
1. Go to NCBI (www.ncbi.nlm.nih.gov) and select BLAST from the Popular Resource section
on the right.
2. Choose protein blast from the Basic BLAST section.
3. Choose blastp from in the program selection tabs along the top.
4. Enter the accession number (NP_001318308).
5. Open the “Algorithm parameters” section. Set the “Expect threshold” as 1e-15.
6. In the Scoring Parameters section, the default substitution matrix is BLOSUM62. Change
the substitution matrix to BLOSUM80.
7. Look at the Gap Costs. Existence refers to creating a gap in the alignment, while Extension
refers to extending a gap in an alignment.
8. Check the ‘Low complexity regions’ filter.
9. Check ‘Show results in a new window’.
10. Click BLAST!
11. The format of the BLASTP output page is broken up into:
● Job summary
● Descriptions
● Graphical summary containing
○ -Conserved domains
○ -Graphic summary
● Alignments
12. Let’s try a different format for our results. Click on Alignment View options in the
Alignments part of the page, and pick Pairwise with dots for identities in the Alignment
View box to change the alignments representation.
13. Move to the graphic summary section and mouse over the summaries. The information
for each hit will be displayed in the tool tip. Scroll to the Descriptions section. Note the identities
and E-value scores of the different hits. The top ~100 hits will be summarized in this section.
Scroll to the last hit in the display.
a. What is its E-value?
b. Do you consider this a good hit?
14. Clicking on the last hit in the graphical summary will take you to the appropriate HSP
(High Scoring Pairs) alignment. Note that there’s a substantial amount of variation between your
query and this database sequence. The Pairwise with dots for identities format makes this
particularly clear as all the variable sites are noted in red letters (you might need to scroll to
the bottom of the alignments section if you’re not automatically taken to the correct entry).
15. At the bottom of the Job Summary section (top of the page) you will see a Taxonomy tab.
Click on this. Look through the list of hits, for each subsection.
a. How is this list for the first subsection (Lineage) organized?
b. What do the numbers between the name of the species and the number of hits mean?
16. Go back to the main results page, and again select Formatting options. In the Alignments
tab, select Query-anchored with dots for identities. Scroll down to the alignments section.
a. How are the alignments organized now?
Ref:
PAM VS BLOSUM:
https://chagall.med.cornell.edu/BioinfoCourse/PDFs/Lecture2/Dayhoff1978.pdf
https://www.youtube.com/watch?v=3h58QOy1QfA
https://www.youtube.com/watch?v=eZsGfmZ3gzU
https://www.youtube.com/watch?v=68lF71zEUF8
https://www.youtube.com/watch?v=ByS7F1bFU5w
https://www.youtube.com/watch?v=MVFVGKebN9U
https://www.youtube.com/watch?v=2Yc2fVNTiH0
https://www.youtube.com/watch?v=3jEeZc3r_hg
https://www.youtube.com/watch?v=8G7vIDA2npk
Convert 4e-15 to number:
https://calculator.name/scientific-notation-to-decimal/4e-15
Lab 2.2: PSI-BLAST
We’re now going to use a new protein search algorithm: Position-Specific Iterated (PSI)-BLAST.
PSI-BLAST is a highly sensitive BLAST program that is extremely useful for finding distantly
related proteins or new members of a protein family. Beyond tracking down protein family
members, you can use PSI-BLAST when your standard protein-protein BLAST search either
fails to find significant hits, or returns hits with descriptions such as "hypothetical protein"
or "similar to...".
In a very general sense, PSI-BLAST starts with a standard protein-protein BLAST, and then
uses these results to build up a more refined search that is tailored to your query over
successive iterations of the search. It does this by building a position-specific scoring
matrix (PSSM), which identifies the specific amino acid changes that are most likely to be
present between your query sequence and similar database sequences. Position-specific
scoring matrices are essentially substitution matrices tailored to your query of interest.
1. Go back to the blastp page from the BlastP part of the lab, load the same accession or
sequence (NP_001318308) and parameters as in the first part of the lab (Blosum 80
Matrix and 1e-15 as the Expect Threshold, low complexity filter). Under ‘Database,’
select ‘UniProtKB/Swiss-Prot’. The Swiss-Prot database is well-curated database that
includes only the most best annotated (characterized) protein sequences. The trade-off of
using this database is that it’s less comprehensive than the default nr option.
2. Under program selection, select ‘PSI-BLAST’. Near the bottom of the page under PSI/PHI
BLAST options, note the PSI-BLAST Threshold of 0.005. Let’s lower it to 1e-40.
a. How will this affect our search results?
3. BLAST!
4. Examine the PSI- BLAST output.
You will notice one very significant difference. There are now two sections in the Description
summary section – one that has sequences with E-values BETTER than [PSI-BLAST]
threshold, and another that has sequences with E-values WORSE than [PSI-BLAST]
threshold.
a. Where is this cutoff (what E-value)?
b. How was it determined?
c. How many sequences are better than the threshold? (Tip: select All and
look at the number selected).
d. Record the accession number, bit score and E-value for one of the top hits in the
group of sequences that did not reach the threshold (here it’s Q9C7G1.1).
5. This first round of PSI-BLAST was just a standard BLASTP. Now we will run another
iteration to refine our search. The successive iterations use all sequences better than the
cutoff to make a new position-specific-substitution matrix, which replaces the
BLOSUM80 matrix used in the original search. This matrix scores based on the non-random
patterns of residue conservation occurring at each site in each pairwise alignment BLAST
performed.
6. Click ‘Run PSI-BLAST iteration 2’.
Scroll down the sequence list. Note how beside some of the sequences there’s either a green
check mark , or they’re highlighted in yellow – these are new sequences that weren’t significant
in the previous iteration, but scored significantly with the refined PSSM.
a. How many new sequences better than the cutoff do you get now?
b. Search for the accession number you saved in step 4. Is it better than the threshold
now? Do the bit score and E-value change for this accession? If so, why?
7. Iterate the search at least another three to five times.
a. What do you notice about the number of new sequences in each iteration?
b. What qualities should high-scoring PSI-BLAST hits theoretically share that would be
of interest to an inquiring geneticist?
c. Can you guess how you would include sequences that were of interest to you, but
which originally did not score better than the threshold in future search?
d. Likewise, how would you remove sequences that scored above the threshold, but
were not of interest?
Ref:
http://homepages.cs.ncl.ac.uk/phillip.lord/teaching/modules-08/theory/blast_variants.html
https://bip.weizmann.ac.il/education/materials/gcg/psiblast.html
Lab 2.3: Comparative Genomics:
High-throughput sequencing initiatives, proteomic efforts, transcriptomics, and other high
throughput genomic technologies, in conjunction with molecular characterization and literature
curation, have resulted in large data collections. In addition to one for the human genome
itself, repositories for the model organisms of worms, fruit flies, mice, Arabidopsis thaliana and
others exist.
However, the representation of genomic data is challenging due to the sheer scope,
complexity, and volume of these data. Tools and the means for effectively displaying such
complex data are continually being developed and improved. We will look at one application
that attempts to deal with this problem, and that permits comparisons of genomic regions
across related species.
Importantly, orthologous genes studied in one species can be assumed to have similar functions
in another species, and residues within orthologs that are highly conserved can be assumed to
be critical for that protein’s function. Therein lies the power of comparative genomics.
Exploring Genomes with Genome Browsers
Each “model” organism (these organisms are so designated because of a long history of
being studied in a medical or agricultural context due to their ease of manipulation, space
requirements, good genetics, among other reasons) has its own genome database that
permits the exploration of genomic regions. Such regions often have other molecules –
homologous genes, ESTs etc. – associated with them. These genomic regions can be explored
with Genome Browsers.
We will only look at the Mouse genome browser in this part of the lab, but here are some
others portals that may be of use in your future studies:
FlyBase: FlyBase is the genomic repository for information for the model organism Drosophila
melanogaster and many other related drosophilid species. Connect to FlyBase at
http://flybase.org and click on the “GBrowse” or “JBrowse” icon to access the Genome Browser.
WormBase: WormBase is the repository for Caenorhabditis elegans and other worm model
species genomic data. Connect to wormbase at http://www.wormbase.org. Click on the
“GBrowse” or “JBrowse” link in the Tool section at the top of the page to access the Genome
Browser for C. elegans.
The Arabidopsis Information Resource: As we’ve seen, TAIR is the repository for Arabidopsis
genomic information. Connect to TAIR at http://www.arabidopsis.org and under the Tools tab,
click on “GBrowse” or “JBrowse”. The Arabidopsis Information Portal is a different initiative
aimed at integrating all data from different sources for Arabidopsis: http://araport.org.
NCBI: One can also examine the human genome in a comparative manner using the NCBI map
viewer application. Connect to the genomes section of the NCBI website at:
http://www.ncbi.nlm.nih.gov/Genomes/ and select the “Human Genome” link under the Custom
Resources, then on the icons of the chromosomes for the most recent Map Viewer release.
Under Maps and Options, it is possible to select sequences from other species to display on the
human Map Viewer.
Mouse Genome Informatics – Comparative Genomics Example
1. Connect to the Mouse Genome Informatics site at: http://www.informatics.jax.org/.
2. Enter Pax6 into the Quick Search box at the top left. Pax6 is a gene important for
proper eye development.
3. Click on the first link in the results list, labeled Pax6. You’ll be taken to the Gene
Detail page for this gene.
4. In the second row, called Location & Maps, click on “More” to expand the section and
then click on the Ensembl Genome Browser link. Ensembl is run by the European
Bioinformatics Institute, the European counterpart of the NCBI. This browser is in some
regards more powerful than the NCBI map viewer, but because it contains so much
information and so many options, it is a bit more confusing. We can use it to examine
similar human and mouse genomic regions, and to view orthologs.
5. In the Ensembl Genome Browser showing Pax6 in its genomic context, use the menu
on the left to select “Region Comparison”.
a. Use the “Select Species or Regions” tab on the bottom left to add the
corresponding human region.
b. Use the zoom slider to zoom out to view ~800,000 base pairs at once.
c. Use the “Configure icon” at the top left of this section to configure the
comparison image.
d. Choose “Comparative features”, and activate the “Join genes” option. Save
this configuration with a new name and click the checkbox at the top right of
the modal popup to return to the main page, as shown above. The thin diagonal
blue lines represent orthologous genes between mouse and human.
i. Is there a human ortholog for this gene? What chromosome is it on?
6. Look at the other human orthologs surrounding PAX6 and their chromosomal locations.
a. List human orthologs surrounding PAX6.

More Related Content

What's hot

What's hot (20)

Structural databases
Structural databases Structural databases
Structural databases
 
Gene bank by kk sahu
Gene bank by kk sahuGene bank by kk sahu
Gene bank by kk sahu
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
 
Pymol
PymolPymol
Pymol
 
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
 
Phylogenetic data analysis
Phylogenetic data analysisPhylogenetic data analysis
Phylogenetic data analysis
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
Maximum parsimony
Maximum parsimonyMaximum parsimony
Maximum parsimony
 
Primary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyanaPrimary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyana
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Rasmol
RasmolRasmol
Rasmol
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
 
Protein database
Protein databaseProtein database
Protein database
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
EMBL
EMBLEMBL
EMBL
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matrices
 
Phylogenetic tree and its construction and phylogeny of
Phylogenetic tree and its construction and phylogeny ofPhylogenetic tree and its construction and phylogeny of
Phylogenetic tree and its construction and phylogeny of
 
DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASE
 

Similar to AUW Protein Modeling Lab

Bioinformatics complete manual
Bioinformatics complete manualBioinformatics complete manual
Bioinformatics complete manualFrazAhmadMazari
 
Lab 10.doc
Lab 10.docLab 10.doc
Lab 10.docbutest
 
Lab 10.doc
Lab 10.docLab 10.doc
Lab 10.docbutest
 
FairBench: A Fairness Assessment Framework
FairBench: A Fairness Assessment FrameworkFairBench: A Fairness Assessment Framework
FairBench: A Fairness Assessment Frameworkmaniopas
 
Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Syed Lokman
 
one complete report from all the 4 labs.pdf
one complete report from all the 4 labs.pdfone complete report from all the 4 labs.pdf
one complete report from all the 4 labs.pdfstudy help
 
one complete report from all the 4 labs.pdf
one complete report from all the 4 labs.pdfone complete report from all the 4 labs.pdf
one complete report from all the 4 labs.pdfstudy help
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENTMULTIPLE SEQUENCE ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENTSyed Lokman
 
assignment2.doc
assignment2.docassignment2.doc
assignment2.docbutest
 
1PhylogeneticAnalysisHomeworkassignmentThisa.docx
1PhylogeneticAnalysisHomeworkassignmentThisa.docx1PhylogeneticAnalysisHomeworkassignmentThisa.docx
1PhylogeneticAnalysisHomeworkassignmentThisa.docxfelicidaddinwoodie
 
BMI214_Assignment2_S..
BMI214_Assignment2_S..BMI214_Assignment2_S..
BMI214_Assignment2_S..butest
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadfalizain9604
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriMonica Munoz-Torres
 
OL 325 Milestone Three Guidelines and Rubric Section
 OL 325 Milestone Three Guidelines and Rubric  Section OL 325 Milestone Three Guidelines and Rubric  Section
OL 325 Milestone Three Guidelines and Rubric SectionMoseStaton39
 
ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfPaul Gardner
 
Survey: Biological Inspired Computing in the Network Security
Survey: Biological Inspired Computing in the Network SecuritySurvey: Biological Inspired Computing in the Network Security
Survey: Biological Inspired Computing in the Network SecurityEswar Publications
 
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCE
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCEINTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCE
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCEIPutuAdiPratama
 

Similar to AUW Protein Modeling Lab (20)

Bioinformatics complete manual
Bioinformatics complete manualBioinformatics complete manual
Bioinformatics complete manual
 
Lab 10.doc
Lab 10.docLab 10.doc
Lab 10.doc
 
Lab 10.doc
Lab 10.docLab 10.doc
Lab 10.doc
 
FairBench: A Fairness Assessment Framework
FairBench: A Fairness Assessment FrameworkFairBench: A Fairness Assessment Framework
FairBench: A Fairness Assessment Framework
 
Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Basic BLAST (BLASTn)
Basic BLAST (BLASTn)
 
one complete report from all the 4 labs.pdf
one complete report from all the 4 labs.pdfone complete report from all the 4 labs.pdf
one complete report from all the 4 labs.pdf
 
one complete report from all the 4 labs.pdf
one complete report from all the 4 labs.pdfone complete report from all the 4 labs.pdf
one complete report from all the 4 labs.pdf
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENTMULTIPLE SEQUENCE ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
assignment2.doc
assignment2.docassignment2.doc
assignment2.doc
 
1PhylogeneticAnalysisHomeworkassignmentThisa.docx
1PhylogeneticAnalysisHomeworkassignmentThisa.docx1PhylogeneticAnalysisHomeworkassignmentThisa.docx
1PhylogeneticAnalysisHomeworkassignmentThisa.docx
 
BMI214_Assignment2_S..
BMI214_Assignment2_S..BMI214_Assignment2_S..
BMI214_Assignment2_S..
 
bioinformatic.pptx
bioinformatic.pptxbioinformatic.pptx
bioinformatic.pptx
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citri
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
OL 325 Milestone Three Guidelines and Rubric Section
 OL 325 Milestone Three Guidelines and Rubric  Section OL 325 Milestone Three Guidelines and Rubric  Section
OL 325 Milestone Three Guidelines and Rubric Section
 
ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdf
 
Survey: Biological Inspired Computing in the Network Security
Survey: Biological Inspired Computing in the Network SecuritySurvey: Biological Inspired Computing in the Network Security
Survey: Biological Inspired Computing in the Network Security
 
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCE
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCEINTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCE
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCE
 

More from Syed Lokman

Water Analysis: Part 2
Water Analysis: Part 2Water Analysis: Part 2
Water Analysis: Part 2Syed Lokman
 
Water Analysis: Part 1
Water Analysis: Part 1Water Analysis: Part 1
Water Analysis: Part 1Syed Lokman
 
Ecological Succession SlideShare
Ecological Succession SlideShareEcological Succession SlideShare
Ecological Succession SlideShareSyed Lokman
 
Intro to Ecology
Intro to EcologyIntro to Ecology
Intro to EcologySyed Lokman
 
Soil Analysis Worksheet
Soil Analysis WorksheetSoil Analysis Worksheet
Soil Analysis WorksheetSyed Lokman
 
Ecological Succession: Part 1
Ecological Succession: Part 1Ecological Succession: Part 1
Ecological Succession: Part 1Syed Lokman
 
Predator Prey Interaction
Predator Prey InteractionPredator Prey Interaction
Predator Prey InteractionSyed Lokman
 
Plant Competition
Plant CompetitionPlant Competition
Plant CompetitionSyed Lokman
 
predator prey: Worksheet
predator prey: Worksheetpredator prey: Worksheet
predator prey: WorksheetSyed Lokman
 
Natural Selection Analysis: Part1
Natural Selection Analysis: Part1Natural Selection Analysis: Part1
Natural Selection Analysis: Part1Syed Lokman
 
Protein Analysis
Protein AnalysisProtein Analysis
Protein AnalysisSyed Lokman
 
Phylogenetic Tree Construction
Phylogenetic Tree ConstructionPhylogenetic Tree Construction
Phylogenetic Tree ConstructionSyed Lokman
 
Natural Selection Analysis: Part2
Natural Selection Analysis: Part2Natural Selection Analysis: Part2
Natural Selection Analysis: Part2Syed Lokman
 
Global Alignment
Global AlignmentGlobal Alignment
Global AlignmentSyed Lokman
 
Review Class on Introduction to Bioinformatics
Review Class on Introduction to BioinformaticsReview Class on Introduction to Bioinformatics
Review Class on Introduction to BioinformaticsSyed Lokman
 

More from Syed Lokman (20)

Water Analysis: Part 2
Water Analysis: Part 2Water Analysis: Part 2
Water Analysis: Part 2
 
Water Analysis: Part 1
Water Analysis: Part 1Water Analysis: Part 1
Water Analysis: Part 1
 
Soil Analysis
Soil AnalysisSoil Analysis
Soil Analysis
 
Ecological Succession SlideShare
Ecological Succession SlideShareEcological Succession SlideShare
Ecological Succession SlideShare
 
Intro to Ecology
Intro to EcologyIntro to Ecology
Intro to Ecology
 
Soil Analysis Worksheet
Soil Analysis WorksheetSoil Analysis Worksheet
Soil Analysis Worksheet
 
Lab Safety
Lab SafetyLab Safety
Lab Safety
 
Sampling Method
Sampling MethodSampling Method
Sampling Method
 
Ecological Succession: Part 1
Ecological Succession: Part 1Ecological Succession: Part 1
Ecological Succession: Part 1
 
Predator Prey Interaction
Predator Prey InteractionPredator Prey Interaction
Predator Prey Interaction
 
Plant Competition
Plant CompetitionPlant Competition
Plant Competition
 
predator prey: Worksheet
predator prey: Worksheetpredator prey: Worksheet
predator prey: Worksheet
 
Natural Selection Analysis: Part1
Natural Selection Analysis: Part1Natural Selection Analysis: Part1
Natural Selection Analysis: Part1
 
Local Alignment
Local AlignmentLocal Alignment
Local Alignment
 
Protein Analysis
Protein AnalysisProtein Analysis
Protein Analysis
 
Phylogenetics
PhylogeneticsPhylogenetics
Phylogenetics
 
Phylogenetic Tree Construction
Phylogenetic Tree ConstructionPhylogenetic Tree Construction
Phylogenetic Tree Construction
 
Natural Selection Analysis: Part2
Natural Selection Analysis: Part2Natural Selection Analysis: Part2
Natural Selection Analysis: Part2
 
Global Alignment
Global AlignmentGlobal Alignment
Global Alignment
 
Review Class on Introduction to Bioinformatics
Review Class on Introduction to BioinformaticsReview Class on Introduction to Bioinformatics
Review Class on Introduction to Bioinformatics
 

Recently uploaded

Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Celine George
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxryandux83rd
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...Nguyen Thanh Tu Collection
 
4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptxmary850239
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptxmary850239
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxMadhavi Dharankar
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
DBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdfDBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdfChristalin Nelson
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri
 
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...HetalPathak10
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
The role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenshipThe role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenshipKarl Donert
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 

Recently uploaded (20)

Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptx
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
 
4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptx
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
DBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdfDBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdf
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Chi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical VariableChi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical Variable
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6
 
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
The role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenshipThe role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenship
 
Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 

AUW Protein Modeling Lab

  • 1. Asian University For Women BINF 3016: Protein Modeling (Lab#02) Advanced BLAST & Comparative Genomics Fall 2022 This content is prepared by Syed Mohammad Lokman (Adjunct Faculty, Bioinformatics & Environmental Sciences), Asian University For Women, Chittagong, Bangladesh. References are cited in between the contents.
  • 2. ADVANCED BLAST [Software needed: web access] There are 3 sections to this lab: BlastP, PSI-Blast, and Comparative Genomics. Last time we used BLAST to query a nucleotide sequence against the NCBI nr database. Now let’s search using a protein sequence. Lab 2.1: BLASTP 1. Go to NCBI (www.ncbi.nlm.nih.gov) and select BLAST from the Popular Resource section on the right. 2. Choose protein blast from the Basic BLAST section. 3. Choose blastp from in the program selection tabs along the top. 4. Enter the accession number (NP_001318308). 5. Open the “Algorithm parameters” section. Set the “Expect threshold” as 1e-15. 6. In the Scoring Parameters section, the default substitution matrix is BLOSUM62. Change the substitution matrix to BLOSUM80. 7. Look at the Gap Costs. Existence refers to creating a gap in the alignment, while Extension refers to extending a gap in an alignment. 8. Check the ‘Low complexity regions’ filter. 9. Check ‘Show results in a new window’. 10. Click BLAST! 11. The format of the BLASTP output page is broken up into: ● Job summary ● Descriptions ● Graphical summary containing ○ -Conserved domains ○ -Graphic summary ● Alignments
  • 3. 12. Let’s try a different format for our results. Click on Alignment View options in the Alignments part of the page, and pick Pairwise with dots for identities in the Alignment View box to change the alignments representation. 13. Move to the graphic summary section and mouse over the summaries. The information for each hit will be displayed in the tool tip. Scroll to the Descriptions section. Note the identities and E-value scores of the different hits. The top ~100 hits will be summarized in this section. Scroll to the last hit in the display. a. What is its E-value? b. Do you consider this a good hit? 14. Clicking on the last hit in the graphical summary will take you to the appropriate HSP (High Scoring Pairs) alignment. Note that there’s a substantial amount of variation between your query and this database sequence. The Pairwise with dots for identities format makes this particularly clear as all the variable sites are noted in red letters (you might need to scroll to the bottom of the alignments section if you’re not automatically taken to the correct entry). 15. At the bottom of the Job Summary section (top of the page) you will see a Taxonomy tab. Click on this. Look through the list of hits, for each subsection. a. How is this list for the first subsection (Lineage) organized? b. What do the numbers between the name of the species and the number of hits mean? 16. Go back to the main results page, and again select Formatting options. In the Alignments tab, select Query-anchored with dots for identities. Scroll down to the alignments section. a. How are the alignments organized now?
  • 5. Lab 2.2: PSI-BLAST We’re now going to use a new protein search algorithm: Position-Specific Iterated (PSI)-BLAST. PSI-BLAST is a highly sensitive BLAST program that is extremely useful for finding distantly related proteins or new members of a protein family. Beyond tracking down protein family members, you can use PSI-BLAST when your standard protein-protein BLAST search either fails to find significant hits, or returns hits with descriptions such as "hypothetical protein" or "similar to...". In a very general sense, PSI-BLAST starts with a standard protein-protein BLAST, and then uses these results to build up a more refined search that is tailored to your query over successive iterations of the search. It does this by building a position-specific scoring matrix (PSSM), which identifies the specific amino acid changes that are most likely to be present between your query sequence and similar database sequences. Position-specific scoring matrices are essentially substitution matrices tailored to your query of interest. 1. Go back to the blastp page from the BlastP part of the lab, load the same accession or sequence (NP_001318308) and parameters as in the first part of the lab (Blosum 80 Matrix and 1e-15 as the Expect Threshold, low complexity filter). Under ‘Database,’ select ‘UniProtKB/Swiss-Prot’. The Swiss-Prot database is well-curated database that includes only the most best annotated (characterized) protein sequences. The trade-off of using this database is that it’s less comprehensive than the default nr option. 2. Under program selection, select ‘PSI-BLAST’. Near the bottom of the page under PSI/PHI BLAST options, note the PSI-BLAST Threshold of 0.005. Let’s lower it to 1e-40. a. How will this affect our search results? 3. BLAST!
  • 6. 4. Examine the PSI- BLAST output. You will notice one very significant difference. There are now two sections in the Description summary section – one that has sequences with E-values BETTER than [PSI-BLAST] threshold, and another that has sequences with E-values WORSE than [PSI-BLAST] threshold. a. Where is this cutoff (what E-value)? b. How was it determined? c. How many sequences are better than the threshold? (Tip: select All and look at the number selected). d. Record the accession number, bit score and E-value for one of the top hits in the group of sequences that did not reach the threshold (here it’s Q9C7G1.1). 5. This first round of PSI-BLAST was just a standard BLASTP. Now we will run another iteration to refine our search. The successive iterations use all sequences better than the cutoff to make a new position-specific-substitution matrix, which replaces the BLOSUM80 matrix used in the original search. This matrix scores based on the non-random patterns of residue conservation occurring at each site in each pairwise alignment BLAST performed. 6. Click ‘Run PSI-BLAST iteration 2’. Scroll down the sequence list. Note how beside some of the sequences there’s either a green check mark , or they’re highlighted in yellow – these are new sequences that weren’t significant in the previous iteration, but scored significantly with the refined PSSM. a. How many new sequences better than the cutoff do you get now? b. Search for the accession number you saved in step 4. Is it better than the threshold now? Do the bit score and E-value change for this accession? If so, why? 7. Iterate the search at least another three to five times. a. What do you notice about the number of new sequences in each iteration? b. What qualities should high-scoring PSI-BLAST hits theoretically share that would be of interest to an inquiring geneticist? c. Can you guess how you would include sequences that were of interest to you, but which originally did not score better than the threshold in future search?
  • 7. d. Likewise, how would you remove sequences that scored above the threshold, but were not of interest? Ref: http://homepages.cs.ncl.ac.uk/phillip.lord/teaching/modules-08/theory/blast_variants.html https://bip.weizmann.ac.il/education/materials/gcg/psiblast.html
  • 8. Lab 2.3: Comparative Genomics: High-throughput sequencing initiatives, proteomic efforts, transcriptomics, and other high throughput genomic technologies, in conjunction with molecular characterization and literature curation, have resulted in large data collections. In addition to one for the human genome itself, repositories for the model organisms of worms, fruit flies, mice, Arabidopsis thaliana and others exist. However, the representation of genomic data is challenging due to the sheer scope, complexity, and volume of these data. Tools and the means for effectively displaying such complex data are continually being developed and improved. We will look at one application that attempts to deal with this problem, and that permits comparisons of genomic regions across related species. Importantly, orthologous genes studied in one species can be assumed to have similar functions in another species, and residues within orthologs that are highly conserved can be assumed to be critical for that protein’s function. Therein lies the power of comparative genomics. Exploring Genomes with Genome Browsers Each “model” organism (these organisms are so designated because of a long history of being studied in a medical or agricultural context due to their ease of manipulation, space requirements, good genetics, among other reasons) has its own genome database that permits the exploration of genomic regions. Such regions often have other molecules – homologous genes, ESTs etc. – associated with them. These genomic regions can be explored with Genome Browsers. We will only look at the Mouse genome browser in this part of the lab, but here are some others portals that may be of use in your future studies:
  • 9. FlyBase: FlyBase is the genomic repository for information for the model organism Drosophila melanogaster and many other related drosophilid species. Connect to FlyBase at http://flybase.org and click on the “GBrowse” or “JBrowse” icon to access the Genome Browser. WormBase: WormBase is the repository for Caenorhabditis elegans and other worm model species genomic data. Connect to wormbase at http://www.wormbase.org. Click on the “GBrowse” or “JBrowse” link in the Tool section at the top of the page to access the Genome Browser for C. elegans. The Arabidopsis Information Resource: As we’ve seen, TAIR is the repository for Arabidopsis genomic information. Connect to TAIR at http://www.arabidopsis.org and under the Tools tab, click on “GBrowse” or “JBrowse”. The Arabidopsis Information Portal is a different initiative aimed at integrating all data from different sources for Arabidopsis: http://araport.org. NCBI: One can also examine the human genome in a comparative manner using the NCBI map viewer application. Connect to the genomes section of the NCBI website at: http://www.ncbi.nlm.nih.gov/Genomes/ and select the “Human Genome” link under the Custom Resources, then on the icons of the chromosomes for the most recent Map Viewer release. Under Maps and Options, it is possible to select sequences from other species to display on the human Map Viewer. Mouse Genome Informatics – Comparative Genomics Example 1. Connect to the Mouse Genome Informatics site at: http://www.informatics.jax.org/. 2. Enter Pax6 into the Quick Search box at the top left. Pax6 is a gene important for proper eye development. 3. Click on the first link in the results list, labeled Pax6. You’ll be taken to the Gene Detail page for this gene. 4. In the second row, called Location & Maps, click on “More” to expand the section and then click on the Ensembl Genome Browser link. Ensembl is run by the European Bioinformatics Institute, the European counterpart of the NCBI. This browser is in some regards more powerful than the NCBI map viewer, but because it contains so much information and so many options, it is a bit more confusing. We can use it to examine similar human and mouse genomic regions, and to view orthologs.
  • 10. 5. In the Ensembl Genome Browser showing Pax6 in its genomic context, use the menu on the left to select “Region Comparison”. a. Use the “Select Species or Regions” tab on the bottom left to add the corresponding human region. b. Use the zoom slider to zoom out to view ~800,000 base pairs at once. c. Use the “Configure icon” at the top left of this section to configure the comparison image. d. Choose “Comparative features”, and activate the “Join genes” option. Save this configuration with a new name and click the checkbox at the top right of the modal popup to return to the main page, as shown above. The thin diagonal blue lines represent orthologous genes between mouse and human. i. Is there a human ortholog for this gene? What chromosome is it on? 6. Look at the other human orthologs surrounding PAX6 and their chromosomal locations. a. List human orthologs surrounding PAX6.