SlideShare a Scribd company logo
1 of 13
Asian University For Women
BINF 3016: Protein Modeling (Lab#06)
Protein Modeling: MODELLER
Fall 2022
This content is prepared by Syed Mohammad Lokman (Adjunct Faculty, Bioinformatics & Environmental Sciences),
Asian University For Women, Chittagong, Bangladesh.
References are cited in between the contents.
6.1. Setting up your Google Colab:
1. Visit https://colab.research.google.com/ to access Google Colab, and Change your
account to Academic account if not already selected
2. Create a New notebook: File > New notebook
3. Rename the filename as “Protein_Modeling_Modeller_yourIDnumber”
4. Connect to a hosted runtime by clicking on the “Connect” button.
5. Click on the “Files” menu from the left panel.
6. Now, look at the Google Colab UI for a while.
Fig: (1) To insert Code or Text; (2) To access files in the current runtime; (3) Code Snippet
7. Write the following code in your code snippet and click on “Play/Run” button to check
whether Google Colab is Working properly or not:
import time
print(time.ctime())
8. If the current time is shown up, Google Colab is working Properly.
6.2. Installing Necessary Softwares: MODELLER, BioPython, and Py3dMol:
1. Remember some Google Colab Shortcuts:
Ctrl + M + M => to create Text Cell
Ctrl + M + B => to create Code Snippet
Ctrl + M + D => to Delete Current Cell
Ctrl + Enter => Execute the Code
2. Installing MODELLER:
a. In order to use MODELLER, you will need to obtain an Academic License by
registering on this website https://salilab.org/modeller/registration.html. The
license key will be immediately sent to your email address.
b. Before running this script, make sure to replace the MODELLER #License Key
with the one sent after registration in the MODELLER website
#1. Installing MODELLER
!wget https://salilab.org/modeller/10.3/modeller-10.3.tar.gz
!tar -zxf modeller-10.3.tar.gz
!echo "MODELLER extraction completed"
%cd modeller-10.3
#2. For installing, including a license key
with open('modeller_config', 'a') as f:
f.write("2n") #2 for selecting x86_64 (Opteron/EM64T) box (Linux)
f.write("/content/compiled/MODELLERn")
f.write("YOUR_LICENSE_KEYn") #ADD YOUR LICENSE KEY HERE!
!./Install < modeller_config
!echo "MODELLER set up completed"
!ln -sf /content/compiled/MODELLER/bin/mod10.3 /usr/bin/
#Checking if MODELLER works
!mod10.3 | awk 'NR==1{if($1=="usage:") print "MODELLER succesfully installed";
else if($1!="usage:") print "Something went wrong. Please install again"}'
%pwd
3. Installing BioPython and Py3dMol:
a. Execute following codes in Google Colab:
#1. Installing biopython using pip
!pip install biopython
#2. Installing py3Dmol using pip
!pip install py3Dmol
#3. And importing the py3Dmol module
import py3Dmol
6.3. Building Profile for Target Protein Sequence
1. Create a Directory (Folder) in your Colab Files Section to gather all the necessary files
together:
a. In Files section, Right Click > New Folder > Rename the folder as “lab6”
2. Prepare your Target Sequence File:
a. Click on the three-dotted menu button besides “lab6” directory. Then click on
“New File” to create a new file for sequence. Rename the file as “target.ali”.
b. Open “target.ali” by double clicking on the file. Paste the following code in the
editor section:
>P1;target
sequence:target:::::::0.00: 0.00
MTSSLPCGQTSLLLQMTERLALSDAHFRRISQLIYQRAGIVLADHKRDMVYNRLVRRLRS
LGLTDFGHYLNLLESNQHSGEWQAFINSLTTNLTAFFREAHHFPLLADHARRRSGEYRVW
SAAASTGEEPYSIAMTLADTLGTAPGRWKVFASDIDTEVLEKARSGIYRHEELKNLTPQQ
LQRYFMRGTGPHEGLVRVRQELANYVDFAPLNLLAKQYTVPGPFDAIFCRNVMIYFDQTT
QQEILRRFVPLLKPDGLLFAGHSENFSHLERRFTLRGQTVYALSKD*
Fig: How target sequence file works
3. Download and extract available protein PDB structures information.
a. Use the following code to download and unzip pdb structures information:
#1. Mode to “lab6” folder
%cd /content/lab6
#2. Downloading pdb_95.pir
!wget https://salilab.org/modeller/downloads/pdb_95.pir.gz
!gunzip pdb_95.pir.gz
4. Search for templates for your target sequence
a. Create a new file named “build_profile.py” in the “lab6” folder and open the file.
b. Modify “build_profile.py” script as follows:
from modeller import *
log.verbose()
env = Environ()
sdb = SequenceDB(env)
sdb.read(seq_database_file='pdb_95.pir', seq_database_format='PIR',
chains_list='ALL', minmax_db_seq_len=(30, 4000),
clean_sequences=True)
sdb.write(seq_database_file='pdb_95.bin', seq_database_format='BINARY',
chains_list='ALL')
sdb.read(seq_database_file='pdb_95.bin', seq_database_format='BINARY',
chains_list='ALL')
#Change according to your File Name here
aln = Alignment(env)
aln.append(file='target.ali', alignment_format='PIR', align_codes='ALL')
prf = aln.to_profile()
prf.build(sdb, matrix_offset=-450, rr_file='${LIB}/blosum62.sim.mat',
gap_penalties_1d=(-500, -50), n_prof_iterations=1,
check_profile=False, max_aln_evalue=0.01)
prf.write(file='build_profile.prf', profile_format='TEXT')
aln = prf.to_alignment()
#-- Write out the alignment file
aln.write(file='build_profile.ali', alignment_format='PIR')
c. Execute build_profile.py by using following code to find template for the target:
#1. Running the profile-build script
!mod10.3 build_profile.py
#2. Printing only the list of potential templates
!sed -n '/HITS FOUND IN ITERATION: 1/,/Weight Matrix/p;/Weight Matrix/q'
build_profile.log
d. The most important columns in the Profile.build() output are the second, tenth,
eleventh and twelfth columns.
i. The second column reports the code of the PDB sequence that was
compared with the target sequence. The PDB code in each line is the
representative of a group of PDB sequences that share 95% or more
sequence identity to each other and have less than 30 residues or 30%
sequence length difference.
ii. The eleventh column reports the percentage sequence identities
between target and a PDB sequence normalized by the lengths of the
alignment (indicated in the tenth column). In general, a sequence identity
value above approximately 25% indicates a potential template unless the
alignment is short (i.e., less than 100 residues).
iii. A better measure of the significance of the alignment is given in the
twelfth column by the e-value of the alignment.
e. To select the most appropriate template for the query sequence, a comparison
could be performed among the selected templates.
i. Download pdb files using following BioPython code:
#Downloading the PDB files using biopython
import os
from pathlib import Path
from Bio.PDB import *
templates = ['5ftw', '5xlx', '5xly', '1af7']
pdbl = PDBList()
for s in templates:
pdbl.retrieve_pdb_file(s, pdir='.', file_format ="pdb", overwrite=True)
os.rename("pdb"+s+".ent", s+".pdb")
ii. Create a new “compare.py” file and modify as follow:
from modeller import *
env = Environ()
aln = Alignment(env)
for (pdb, chain) in (('5ftw', 'A'), ('5xlx', 'A'), ('5xly', 'A'),
('1af7', 'A')):
m = Model(env, file=pdb, model_segment=('FIRST:'+chain, 'LAST:'+chain))
aln.append_model(m, atom_files=pdb, align_codes=pdb+chain)
aln.malign()
aln.malign3d()
aln.compare_structures()
aln.id_table(matrix_file='family.mat')
env.dendrogram(matrix_file='family.mat', cluster_cut=-1.0)
iii. Execute compare.py:
#1. Running the compare script
!mod10.3 compare.py
#2. Check the log file
!sed -ne '/Sequence identity comparison (ID_TABLE):/,$ p' compare.log
f. From the comparison, select the best template for modeling
5. Aligning Target-Template:
a. Create a new file named “align2D.py” and modify it as follow:
from modeller import *
env = Environ()
aln = Alignment(env)
mdl = Model(env, file='1af7', model_segment=('FIRST:A','LAST:A'))
#Provide PDB code of your template in the next line.
aln.append_model(mdl, align_codes='1af7A', atom_files='1af7.pdb')
aln.append(file='target.ali', align_codes='target')
aln.align2d(max_gap_length=50)
aln.write(file='aligned.fasta', alignment_format='FASTA')
aln.write(file='aligned.ali', alignment_format='PIR')
aln.write(file='aligned.pap', alignment_format='PAP')
b. Execute align2d.py as follow:
#1. Running the align2D script
!mod10.3 align2d.py
c. You will end up with two new files (aligned.ali and aligned.fasta) that contain the
pairwise alignment of the target and template sequences.
6. Model Building:
a. Now, you have three files to build models:
i. 1. target 2. template and 3. alignment
b. Create a new file named “model-single.py” and modify as foolows:
from modeller import *
from modeller.automodel import *
env = environ()
a = automodel(env, alnfile='aligned.ali',
knowns='1af7A', sequence='target',
assess_methods=(assess.DOPE,
#soap_protein_od.Scorer(),
assess.GA341))
a.starting_model = 1
a.ending_model = 50
a.make()
# Get a list of all successfully built models from a.outputs
ok_models = filter(lambda x: x['failure'] is None, a.outputs)
# Rank the models by DOPE score
key = 'DOPE score'
ok_models.sort(lambda a,b: cmp(a[key], b[key]))
# Get top model
m = ok_models[0]
print "Top model: %s (DOPE score %.3f)" % (m['name'], m[key])
c. Execute model-single.py as follows:
#1.Running the model-single script
!mod10.3 model-single.py
d. The model-single.log output has the total potential energy for each
structure,according to MODELLER’s DOPE (discrete optimized protein energy)
score. The log file gives a summary of all the models built. The last line of the log
file contains the best model according to the DOPE score.
7. Model Visualization using Py3dMol:
a. Execute the following codes: (change the model number if necessary)
#1. Copying our best model with a new chain id (To Superimpose)
!sed "s/ A / E /g" target.B99990046.pdb > bestmodel.pdb
!sed "s/ A / D /g" target.B99990040.pdb > best2model.pdb
#2. Setting up py3Dmol for visualization
view=py3Dmol.view()
#3. Loading template
view.addModel(open('1af7.pdb', 'r').read(),'pdb')
#4. Loading best DOPE score model
view.addModel(open('bestmodel.pdb', 'r').read(),'pdb')
view.addModel(open('best2model.pdb', 'r').read(),'pdb')
#5. Zooming into all visualized structures
view.zoomTo()
#6. Here we set the background color as white
view.setBackgroundColor('white')
#7. Here we set the visualization style for chains
view.setStyle({'chain':'A'},{'cartoon': {'color':'purple'}})
view.setStyle({'chain':'E'},{'cartoon': {'color':'yellow'}})
view.setStyle({'chain':'D'},{'cartoon': {'color':'green'}})
#8. And we finally visualize the structures using the command below
view.show()
8. Download all the data of your “lab6” directory:
a. Execute the following code:
#1. Archive your files
!zip -r /content/lab6.zip /content/lab6
#2. Download from Google Colab
from google.colab import files
files.download("/content/lab6.zip")
9. Model Evaluation:
a. Visit SAVES Server: https://saves.mbi.ucla.edu/ and upload your best model after
renaming as “best-model.pdb”
b. Consider the following factors:
i. VERIFY3D (i.e. compatibility of an atomic 3D model to its 1D
sequence when compared tothe energetics of good structures from
the PDB).
Check the VERIFY3D results: >80% of the residues should have an
average score ≥ 0.2, whereas the score profile allows you to identify
conflicting regions.
ii. PROCHECK (stereochemical and geometrical quality of the model,
via Ramachandran plots, sidechain rotamers, etc).
Check the Ramachandran plot: Are there any residues outside the
allowed regions? What types of residues are found within those
regions? (Check it by clicking on each dot in the plot).
Check the errors in PROCHECK: are the errors located within the
loop regions?
Lab#6: Exercise:
● Build a Model of Prolyl 4-hydroxylase 13 · Arabidopsis thaliana (UniProt: F4ILF8)
using MODELLER. You should follow the instructions given below:
○ Target File Name: “target_YOURidNUMBER.ali”
○ Submit Colab Notebook (Download the ipynb format)
○ Attach Valuation Report and Based on the Valuation report, Interpret your
Model
○ Download the lab6 folder and rename it as “lab6_YOURidNumber”
○ Upload all the files in a Drive Folder (Notebook, Validation Report,
Interpretation of Validation Report, Lab Folder) and Share the Drive Folder
as Assignment file.
● Submit the Assignment at least two days before the next class (23rd October,
2022). Submission after 23rd October will not be accepted.
Reference:
● https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007449
● https://saves.mbi.ucla.edu/
● http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2242414/
● http://salilab.org/modeller/9.13/manual/node255.html
About DOPE Score:
DOPE score is a pairwise atomistic statistical potential which is used to distinguish the
"good" models from the "bad" ones. Lower the DOPE score better is the model. So, it is
used to compare models made of the single amino acids sequence.
DOPE (Discrete optimized protein energy) gives information by comparison of energies
from different models generated taking into account the same sequence. It is useful to
select the best model in terms of energy. DOPE score is only useful to rank the
generated models for a single amino acid sequence.

More Related Content

Similar to Protein Modeling using MODELLER

Build and deploy Python Django project
Build and deploy Python Django projectBuild and deploy Python Django project
Build and deploy Python Django projectXiaoqi Zhao
 
Azure machine learning service
Azure machine learning serviceAzure machine learning service
Azure machine learning serviceRuth Yakubu
 
Introduction to django
Introduction to djangoIntroduction to django
Introduction to djangoIlian Iliev
 
1 Goals. 1. To use a text file for output and later for in.docx
1 Goals. 1. To use a text file for output and later for in.docx1 Goals. 1. To use a text file for output and later for in.docx
1 Goals. 1. To use a text file for output and later for in.docxhoney690131
 
Django tutorial
Django tutorialDjango tutorial
Django tutorialKsd Che
 
You've done the Django Tutorial, what next?
You've done the Django Tutorial, what next?You've done the Django Tutorial, what next?
You've done the Django Tutorial, what next?Andy McKay
 
Rifartek Robot Training Course - How to use ClientRobot
Rifartek Robot Training Course - How to use ClientRobotRifartek Robot Training Course - How to use ClientRobot
Rifartek Robot Training Course - How to use ClientRobotTsai Tsung-Yi
 
Useful practices of creation automatic tests by using cucumber jvm
Useful practices of creation automatic tests by using cucumber jvmUseful practices of creation automatic tests by using cucumber jvm
Useful practices of creation automatic tests by using cucumber jvmAnton Shapin
 
Autoconf&Automake
Autoconf&AutomakeAutoconf&Automake
Autoconf&Automakeniurui
 
Mock Hell PyCon DE and PyData Berlin 2019
Mock Hell PyCon DE and PyData Berlin 2019Mock Hell PyCon DE and PyData Berlin 2019
Mock Hell PyCon DE and PyData Berlin 2019Edwin Jung
 
BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023Timothy Spann
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Stamatis Zampetakis
 
Viktor Tsykunov "Microsoft AI platform for every Developer"
Viktor Tsykunov "Microsoft AI platform for every Developer"Viktor Tsykunov "Microsoft AI platform for every Developer"
Viktor Tsykunov "Microsoft AI platform for every Developer"Lviv Startup Club
 
CodeIgniter PHP MVC Framework
CodeIgniter PHP MVC FrameworkCodeIgniter PHP MVC Framework
CodeIgniter PHP MVC FrameworkBo-Yi Wu
 
Expanding XPages with Bootstrap Plugins for Ultimate Usability
Expanding XPages with Bootstrap Plugins for Ultimate UsabilityExpanding XPages with Bootstrap Plugins for Ultimate Usability
Expanding XPages with Bootstrap Plugins for Ultimate UsabilityTeamstudio
 
ZopeSkel & Buildout packages
ZopeSkel & Buildout packagesZopeSkel & Buildout packages
ZopeSkel & Buildout packagesQuintagroup
 

Similar to Protein Modeling using MODELLER (20)

Build and deploy Python Django project
Build and deploy Python Django projectBuild and deploy Python Django project
Build and deploy Python Django project
 
Deploying Machine Learning Models to Production
Deploying Machine Learning Models to ProductionDeploying Machine Learning Models to Production
Deploying Machine Learning Models to Production
 
Azure machine learning service
Azure machine learning serviceAzure machine learning service
Azure machine learning service
 
Introduction to django
Introduction to djangoIntroduction to django
Introduction to django
 
1 Goals. 1. To use a text file for output and later for in.docx
1 Goals. 1. To use a text file for output and later for in.docx1 Goals. 1. To use a text file for output and later for in.docx
1 Goals. 1. To use a text file for output and later for in.docx
 
Django tutorial
Django tutorialDjango tutorial
Django tutorial
 
You've done the Django Tutorial, what next?
You've done the Django Tutorial, what next?You've done the Django Tutorial, what next?
You've done the Django Tutorial, what next?
 
Rifartek Robot Training Course - How to use ClientRobot
Rifartek Robot Training Course - How to use ClientRobotRifartek Robot Training Course - How to use ClientRobot
Rifartek Robot Training Course - How to use ClientRobot
 
PhpBB meets Symfony2
PhpBB meets Symfony2PhpBB meets Symfony2
PhpBB meets Symfony2
 
Mini Curso de Django
Mini Curso de DjangoMini Curso de Django
Mini Curso de Django
 
Useful practices of creation automatic tests by using cucumber jvm
Useful practices of creation automatic tests by using cucumber jvmUseful practices of creation automatic tests by using cucumber jvm
Useful practices of creation automatic tests by using cucumber jvm
 
Autoconf&Automake
Autoconf&AutomakeAutoconf&Automake
Autoconf&Automake
 
Mock Hell PyCon DE and PyData Berlin 2019
Mock Hell PyCon DE and PyData Berlin 2019Mock Hell PyCon DE and PyData Berlin 2019
Mock Hell PyCon DE and PyData Berlin 2019
 
BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023
 
Django - basics
Django - basicsDjango - basics
Django - basics
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
Viktor Tsykunov "Microsoft AI platform for every Developer"
Viktor Tsykunov "Microsoft AI platform for every Developer"Viktor Tsykunov "Microsoft AI platform for every Developer"
Viktor Tsykunov "Microsoft AI platform for every Developer"
 
CodeIgniter PHP MVC Framework
CodeIgniter PHP MVC FrameworkCodeIgniter PHP MVC Framework
CodeIgniter PHP MVC Framework
 
Expanding XPages with Bootstrap Plugins for Ultimate Usability
Expanding XPages with Bootstrap Plugins for Ultimate UsabilityExpanding XPages with Bootstrap Plugins for Ultimate Usability
Expanding XPages with Bootstrap Plugins for Ultimate Usability
 
ZopeSkel & Buildout packages
ZopeSkel & Buildout packagesZopeSkel & Buildout packages
ZopeSkel & Buildout packages
 

More from Syed Lokman

Water Analysis: Part 2
Water Analysis: Part 2Water Analysis: Part 2
Water Analysis: Part 2Syed Lokman
 
Water Analysis: Part 1
Water Analysis: Part 1Water Analysis: Part 1
Water Analysis: Part 1Syed Lokman
 
Ecological Succession SlideShare
Ecological Succession SlideShareEcological Succession SlideShare
Ecological Succession SlideShareSyed Lokman
 
Intro to Ecology
Intro to EcologyIntro to Ecology
Intro to EcologySyed Lokman
 
Soil Analysis Worksheet
Soil Analysis WorksheetSoil Analysis Worksheet
Soil Analysis WorksheetSyed Lokman
 
Ecological Succession: Part 1
Ecological Succession: Part 1Ecological Succession: Part 1
Ecological Succession: Part 1Syed Lokman
 
Predator Prey Interaction
Predator Prey InteractionPredator Prey Interaction
Predator Prey InteractionSyed Lokman
 
Plant Competition
Plant CompetitionPlant Competition
Plant CompetitionSyed Lokman
 
predator prey: Worksheet
predator prey: Worksheetpredator prey: Worksheet
predator prey: WorksheetSyed Lokman
 
Natural Selection Analysis: Part1
Natural Selection Analysis: Part1Natural Selection Analysis: Part1
Natural Selection Analysis: Part1Syed Lokman
 
Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Syed Lokman
 
Protein Analysis
Protein AnalysisProtein Analysis
Protein AnalysisSyed Lokman
 
Phylogenetic Tree Construction
Phylogenetic Tree ConstructionPhylogenetic Tree Construction
Phylogenetic Tree ConstructionSyed Lokman
 
Natural Selection Analysis: Part2
Natural Selection Analysis: Part2Natural Selection Analysis: Part2
Natural Selection Analysis: Part2Syed Lokman
 
Global Alignment
Global AlignmentGlobal Alignment
Global AlignmentSyed Lokman
 

More from Syed Lokman (20)

Water Analysis: Part 2
Water Analysis: Part 2Water Analysis: Part 2
Water Analysis: Part 2
 
Water Analysis: Part 1
Water Analysis: Part 1Water Analysis: Part 1
Water Analysis: Part 1
 
Soil Analysis
Soil AnalysisSoil Analysis
Soil Analysis
 
Ecological Succession SlideShare
Ecological Succession SlideShareEcological Succession SlideShare
Ecological Succession SlideShare
 
Intro to Ecology
Intro to EcologyIntro to Ecology
Intro to Ecology
 
Soil Analysis Worksheet
Soil Analysis WorksheetSoil Analysis Worksheet
Soil Analysis Worksheet
 
Lab Safety
Lab SafetyLab Safety
Lab Safety
 
Sampling Method
Sampling MethodSampling Method
Sampling Method
 
Ecological Succession: Part 1
Ecological Succession: Part 1Ecological Succession: Part 1
Ecological Succession: Part 1
 
Predator Prey Interaction
Predator Prey InteractionPredator Prey Interaction
Predator Prey Interaction
 
Plant Competition
Plant CompetitionPlant Competition
Plant Competition
 
predator prey: Worksheet
predator prey: Worksheetpredator prey: Worksheet
predator prey: Worksheet
 
Natural Selection Analysis: Part1
Natural Selection Analysis: Part1Natural Selection Analysis: Part1
Natural Selection Analysis: Part1
 
Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Basic BLAST (BLASTn)
Basic BLAST (BLASTn)
 
Local Alignment
Local AlignmentLocal Alignment
Local Alignment
 
Protein Analysis
Protein AnalysisProtein Analysis
Protein Analysis
 
Phylogenetics
PhylogeneticsPhylogenetics
Phylogenetics
 
Phylogenetic Tree Construction
Phylogenetic Tree ConstructionPhylogenetic Tree Construction
Phylogenetic Tree Construction
 
Natural Selection Analysis: Part2
Natural Selection Analysis: Part2Natural Selection Analysis: Part2
Natural Selection Analysis: Part2
 
Global Alignment
Global AlignmentGlobal Alignment
Global Alignment
 

Recently uploaded

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 

Recently uploaded (20)

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 

Protein Modeling using MODELLER

  • 1. Asian University For Women BINF 3016: Protein Modeling (Lab#06) Protein Modeling: MODELLER Fall 2022 This content is prepared by Syed Mohammad Lokman (Adjunct Faculty, Bioinformatics & Environmental Sciences), Asian University For Women, Chittagong, Bangladesh. References are cited in between the contents.
  • 2. 6.1. Setting up your Google Colab: 1. Visit https://colab.research.google.com/ to access Google Colab, and Change your account to Academic account if not already selected 2. Create a New notebook: File > New notebook 3. Rename the filename as “Protein_Modeling_Modeller_yourIDnumber” 4. Connect to a hosted runtime by clicking on the “Connect” button.
  • 3. 5. Click on the “Files” menu from the left panel. 6. Now, look at the Google Colab UI for a while. Fig: (1) To insert Code or Text; (2) To access files in the current runtime; (3) Code Snippet 7. Write the following code in your code snippet and click on “Play/Run” button to check whether Google Colab is Working properly or not: import time print(time.ctime()) 8. If the current time is shown up, Google Colab is working Properly.
  • 4. 6.2. Installing Necessary Softwares: MODELLER, BioPython, and Py3dMol: 1. Remember some Google Colab Shortcuts: Ctrl + M + M => to create Text Cell Ctrl + M + B => to create Code Snippet Ctrl + M + D => to Delete Current Cell Ctrl + Enter => Execute the Code 2. Installing MODELLER: a. In order to use MODELLER, you will need to obtain an Academic License by registering on this website https://salilab.org/modeller/registration.html. The license key will be immediately sent to your email address. b. Before running this script, make sure to replace the MODELLER #License Key with the one sent after registration in the MODELLER website #1. Installing MODELLER !wget https://salilab.org/modeller/10.3/modeller-10.3.tar.gz !tar -zxf modeller-10.3.tar.gz !echo "MODELLER extraction completed" %cd modeller-10.3 #2. For installing, including a license key with open('modeller_config', 'a') as f: f.write("2n") #2 for selecting x86_64 (Opteron/EM64T) box (Linux) f.write("/content/compiled/MODELLERn") f.write("YOUR_LICENSE_KEYn") #ADD YOUR LICENSE KEY HERE! !./Install < modeller_config !echo "MODELLER set up completed" !ln -sf /content/compiled/MODELLER/bin/mod10.3 /usr/bin/ #Checking if MODELLER works !mod10.3 | awk 'NR==1{if($1=="usage:") print "MODELLER succesfully installed"; else if($1!="usage:") print "Something went wrong. Please install again"}' %pwd 3. Installing BioPython and Py3dMol: a. Execute following codes in Google Colab: #1. Installing biopython using pip !pip install biopython #2. Installing py3Dmol using pip !pip install py3Dmol #3. And importing the py3Dmol module import py3Dmol
  • 5. 6.3. Building Profile for Target Protein Sequence 1. Create a Directory (Folder) in your Colab Files Section to gather all the necessary files together: a. In Files section, Right Click > New Folder > Rename the folder as “lab6” 2. Prepare your Target Sequence File: a. Click on the three-dotted menu button besides “lab6” directory. Then click on “New File” to create a new file for sequence. Rename the file as “target.ali”. b. Open “target.ali” by double clicking on the file. Paste the following code in the editor section:
  • 6. >P1;target sequence:target:::::::0.00: 0.00 MTSSLPCGQTSLLLQMTERLALSDAHFRRISQLIYQRAGIVLADHKRDMVYNRLVRRLRS LGLTDFGHYLNLLESNQHSGEWQAFINSLTTNLTAFFREAHHFPLLADHARRRSGEYRVW SAAASTGEEPYSIAMTLADTLGTAPGRWKVFASDIDTEVLEKARSGIYRHEELKNLTPQQ LQRYFMRGTGPHEGLVRVRQELANYVDFAPLNLLAKQYTVPGPFDAIFCRNVMIYFDQTT QQEILRRFVPLLKPDGLLFAGHSENFSHLERRFTLRGQTVYALSKD* Fig: How target sequence file works 3. Download and extract available protein PDB structures information. a. Use the following code to download and unzip pdb structures information: #1. Mode to “lab6” folder %cd /content/lab6 #2. Downloading pdb_95.pir !wget https://salilab.org/modeller/downloads/pdb_95.pir.gz !gunzip pdb_95.pir.gz
  • 7. 4. Search for templates for your target sequence a. Create a new file named “build_profile.py” in the “lab6” folder and open the file. b. Modify “build_profile.py” script as follows: from modeller import * log.verbose() env = Environ() sdb = SequenceDB(env) sdb.read(seq_database_file='pdb_95.pir', seq_database_format='PIR', chains_list='ALL', minmax_db_seq_len=(30, 4000), clean_sequences=True) sdb.write(seq_database_file='pdb_95.bin', seq_database_format='BINARY', chains_list='ALL') sdb.read(seq_database_file='pdb_95.bin', seq_database_format='BINARY', chains_list='ALL') #Change according to your File Name here aln = Alignment(env) aln.append(file='target.ali', alignment_format='PIR', align_codes='ALL') prf = aln.to_profile() prf.build(sdb, matrix_offset=-450, rr_file='${LIB}/blosum62.sim.mat', gap_penalties_1d=(-500, -50), n_prof_iterations=1, check_profile=False, max_aln_evalue=0.01) prf.write(file='build_profile.prf', profile_format='TEXT') aln = prf.to_alignment() #-- Write out the alignment file aln.write(file='build_profile.ali', alignment_format='PIR') c. Execute build_profile.py by using following code to find template for the target: #1. Running the profile-build script !mod10.3 build_profile.py #2. Printing only the list of potential templates !sed -n '/HITS FOUND IN ITERATION: 1/,/Weight Matrix/p;/Weight Matrix/q' build_profile.log
  • 8. d. The most important columns in the Profile.build() output are the second, tenth, eleventh and twelfth columns. i. The second column reports the code of the PDB sequence that was compared with the target sequence. The PDB code in each line is the representative of a group of PDB sequences that share 95% or more sequence identity to each other and have less than 30 residues or 30% sequence length difference. ii. The eleventh column reports the percentage sequence identities between target and a PDB sequence normalized by the lengths of the alignment (indicated in the tenth column). In general, a sequence identity value above approximately 25% indicates a potential template unless the alignment is short (i.e., less than 100 residues). iii. A better measure of the significance of the alignment is given in the twelfth column by the e-value of the alignment. e. To select the most appropriate template for the query sequence, a comparison could be performed among the selected templates. i. Download pdb files using following BioPython code: #Downloading the PDB files using biopython import os from pathlib import Path from Bio.PDB import * templates = ['5ftw', '5xlx', '5xly', '1af7'] pdbl = PDBList() for s in templates: pdbl.retrieve_pdb_file(s, pdir='.', file_format ="pdb", overwrite=True) os.rename("pdb"+s+".ent", s+".pdb") ii. Create a new “compare.py” file and modify as follow: from modeller import * env = Environ() aln = Alignment(env) for (pdb, chain) in (('5ftw', 'A'), ('5xlx', 'A'), ('5xly', 'A'), ('1af7', 'A')): m = Model(env, file=pdb, model_segment=('FIRST:'+chain, 'LAST:'+chain)) aln.append_model(m, atom_files=pdb, align_codes=pdb+chain) aln.malign() aln.malign3d() aln.compare_structures() aln.id_table(matrix_file='family.mat') env.dendrogram(matrix_file='family.mat', cluster_cut=-1.0)
  • 9. iii. Execute compare.py: #1. Running the compare script !mod10.3 compare.py #2. Check the log file !sed -ne '/Sequence identity comparison (ID_TABLE):/,$ p' compare.log f. From the comparison, select the best template for modeling 5. Aligning Target-Template: a. Create a new file named “align2D.py” and modify it as follow: from modeller import * env = Environ() aln = Alignment(env) mdl = Model(env, file='1af7', model_segment=('FIRST:A','LAST:A')) #Provide PDB code of your template in the next line. aln.append_model(mdl, align_codes='1af7A', atom_files='1af7.pdb') aln.append(file='target.ali', align_codes='target') aln.align2d(max_gap_length=50) aln.write(file='aligned.fasta', alignment_format='FASTA') aln.write(file='aligned.ali', alignment_format='PIR') aln.write(file='aligned.pap', alignment_format='PAP') b. Execute align2d.py as follow: #1. Running the align2D script !mod10.3 align2d.py c. You will end up with two new files (aligned.ali and aligned.fasta) that contain the pairwise alignment of the target and template sequences.
  • 10. 6. Model Building: a. Now, you have three files to build models: i. 1. target 2. template and 3. alignment b. Create a new file named “model-single.py” and modify as foolows: from modeller import * from modeller.automodel import * env = environ() a = automodel(env, alnfile='aligned.ali', knowns='1af7A', sequence='target', assess_methods=(assess.DOPE, #soap_protein_od.Scorer(), assess.GA341)) a.starting_model = 1 a.ending_model = 50 a.make() # Get a list of all successfully built models from a.outputs ok_models = filter(lambda x: x['failure'] is None, a.outputs) # Rank the models by DOPE score key = 'DOPE score' ok_models.sort(lambda a,b: cmp(a[key], b[key])) # Get top model m = ok_models[0] print "Top model: %s (DOPE score %.3f)" % (m['name'], m[key]) c. Execute model-single.py as follows: #1.Running the model-single script !mod10.3 model-single.py d. The model-single.log output has the total potential energy for each structure,according to MODELLER’s DOPE (discrete optimized protein energy) score. The log file gives a summary of all the models built. The last line of the log file contains the best model according to the DOPE score.
  • 11. 7. Model Visualization using Py3dMol: a. Execute the following codes: (change the model number if necessary) #1. Copying our best model with a new chain id (To Superimpose) !sed "s/ A / E /g" target.B99990046.pdb > bestmodel.pdb !sed "s/ A / D /g" target.B99990040.pdb > best2model.pdb #2. Setting up py3Dmol for visualization view=py3Dmol.view() #3. Loading template view.addModel(open('1af7.pdb', 'r').read(),'pdb') #4. Loading best DOPE score model view.addModel(open('bestmodel.pdb', 'r').read(),'pdb') view.addModel(open('best2model.pdb', 'r').read(),'pdb') #5. Zooming into all visualized structures view.zoomTo() #6. Here we set the background color as white view.setBackgroundColor('white') #7. Here we set the visualization style for chains view.setStyle({'chain':'A'},{'cartoon': {'color':'purple'}}) view.setStyle({'chain':'E'},{'cartoon': {'color':'yellow'}}) view.setStyle({'chain':'D'},{'cartoon': {'color':'green'}}) #8. And we finally visualize the structures using the command below view.show() 8. Download all the data of your “lab6” directory: a. Execute the following code: #1. Archive your files !zip -r /content/lab6.zip /content/lab6 #2. Download from Google Colab from google.colab import files files.download("/content/lab6.zip") 9. Model Evaluation: a. Visit SAVES Server: https://saves.mbi.ucla.edu/ and upload your best model after renaming as “best-model.pdb”
  • 12. b. Consider the following factors: i. VERIFY3D (i.e. compatibility of an atomic 3D model to its 1D sequence when compared tothe energetics of good structures from the PDB). Check the VERIFY3D results: >80% of the residues should have an average score ≥ 0.2, whereas the score profile allows you to identify conflicting regions. ii. PROCHECK (stereochemical and geometrical quality of the model, via Ramachandran plots, sidechain rotamers, etc). Check the Ramachandran plot: Are there any residues outside the allowed regions? What types of residues are found within those regions? (Check it by clicking on each dot in the plot). Check the errors in PROCHECK: are the errors located within the loop regions? Lab#6: Exercise: ● Build a Model of Prolyl 4-hydroxylase 13 · Arabidopsis thaliana (UniProt: F4ILF8) using MODELLER. You should follow the instructions given below: ○ Target File Name: “target_YOURidNUMBER.ali” ○ Submit Colab Notebook (Download the ipynb format) ○ Attach Valuation Report and Based on the Valuation report, Interpret your Model ○ Download the lab6 folder and rename it as “lab6_YOURidNumber” ○ Upload all the files in a Drive Folder (Notebook, Validation Report, Interpretation of Validation Report, Lab Folder) and Share the Drive Folder as Assignment file. ● Submit the Assignment at least two days before the next class (23rd October, 2022). Submission after 23rd October will not be accepted.
  • 13. Reference: ● https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007449 ● https://saves.mbi.ucla.edu/ ● http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2242414/ ● http://salilab.org/modeller/9.13/manual/node255.html About DOPE Score: DOPE score is a pairwise atomistic statistical potential which is used to distinguish the "good" models from the "bad" ones. Lower the DOPE score better is the model. So, it is used to compare models made of the single amino acids sequence. DOPE (Discrete optimized protein energy) gives information by comparison of energies from different models generated taking into account the same sequence. It is useful to select the best model in terms of energy. DOPE score is only useful to rank the generated models for a single amino acid sequence.