SlideShare a Scribd company logo
Subject: Pharmacoinformatics
Government Post Graduate College Mandian Abbottabad
Assignment no 2: PROJECT ASSIGNMENT MODELLER
Submitted by:
Name: Zarlish Attique
Registration no: 187104
Subject: Pharmacoinformatics
Department: Bioinformatics
Semester: 5th
Submitted to:
Teacher Name: Sir Muhammad Imran Sharif
Department of Bioinformatics
Date of Submission: January 7,2020
PROJECT QUESTIONS: -
1. Take Any protein sequence (make sure that the 3D structure is not present in the PDB
database), predict the structure by using MODELLER.
2. Write down functions of the protein, structural organization (no. of beta sheets, helices
etc).
3. Write methodology and results of modeling procedure.
2 | P a g e
Homology Modeling
Homology modeling, also known as comparative modeling of protein, refers to constructing an
atomic-resolution model of the "target" protein from its amino acid sequence and an
experimental three-dimensional structure of a related homologous protein.
3 | P a g e
ABOUT PROTEIN: dACE2
Truncated angiotensin converting enzyme 2
Primate-specific isoform of ACE2
Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), which causes COVID19,
utilizes angiotensin-converting enzyme 2 (ACE2) for entry into target cells. ACE2 has been
proposed as an interferon-stimulated gene (ISG). Thus, interferon-induced variability in ACE2
expression levels could be important for susceptibility to COVID-19 or its outcomes. The
discovery of a novel, primate-specific isoform of ACE2 has been reported, which is designate as
deltaACE2 (dACE2). Demonstrate that dACE2, but not ACE2, is an ISG. In vitro, dACE2, which
lacks 356 N-terminal amino acids, was non-functional in binding the SARS-CoV-2 spike protein
and as a carboxypeptidase. Their results reconcile current knowledge on ACE2 expression and
suggest that the ISG-type induction of dACE2 in IFN-high conditions created by treatments,
inflammatory tumor microenvironment, or viral co-infections is unlikely to affect the cellular
entry of SARS-CoV-2 and promote infection.
An interferon-stimulated gene (ISG) is a gene whose expression is stimulated by interferon.
Interferons (IFNs) are a group of signaling proteins made and released by host cells in response
to the presence of several viruses. In a typical scenario, a virus-infected cell will
release interferons causing nearby cells to heighten their anti-viral defenses.
4 | P a g e
METHODOLOGY AND THE RESULTS OF PROTEIN STRUCTURE
PREDICTION AND MODELLER
1. Protein selection from UniProt with no known structure.
UniProt is a freely accessible database of protein sequence and functional information, many
entries being derived from genome sequencing projects. It contains a large amount of information
about the biological function of proteins derived from the research literature. In the first step, the
protein with UniProtKB-A0A7D6JAD5_HUMAN has been selected for the study.
Figure: Till 2-jan-2020 the structure is unknown not present in pdb as well.
NCBI Nucleotide: https://www.ncbi.nlm.nih.gov/nucleotide/MT505392
NCBI Protein: https://www.ncbi.nlm.nih.gov/protein/1878857681
NCBI Taxonomy: https://www.ncbi.nlm.nih.gov/taxonomy/?term=9606
24-JUL-2020
5 | P a g e
Table: Entry information from UniProt.
Entry name A0A7D6JAD5_HUMAN
Accession Primary (citable) accession number: A0A7D6JAD5
Entry history Integrated into UniProtKB/TrEMBL: December 2, 2020
Last sequence update: December 2, 2020
Last modified: December 2, 2020
Entry status Unreviewed (UniProtKB/TrEMBL)
FASTA Sequence
A0A7D6JAD5_HUMAN taken from UniProt.
Length:459
Mass (Da):52,737
>tr|A0A7D6JAD5|A0A7D6JAD5_HUMAN Truncated angiotensin converting enzyme 2
OS=Homo sapiens OX=9606 GN=ACE2 PE=2 SV=1
MREAGWDKGGRILMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHE
AVGE
IMSLSAATPKHLKSIGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKG
EIPKDQWMKKWWEMKREIVGVVEPVPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQ
FQ
EALCQAAKHEGPLHKCDISNSTEAGQKLFNMLRLGKSEPWTLALENVVGAKNMNVRPL
LN
YFEPLFTWLKDQNKNSFVGWSTDWSPYADQSIKVRISLKSALGDKAYEWNDNEMYLFR
SS
VAYAMRQYFLKVKNQMILFGEEDVRVANLKPRISFNFFVTAPKNVSDIIPRTEVEKAIRM
SRSRINDAFRLNDNSLEFLGIQPTLGPPNQPPVSIWLIVFGVVMGVIVVGIVILIFTGIR
DRKKKNKARSGENPYASIDISKGENNPGFQNTDDVQTSF
6 | P a g e
2. Template recognition and initial alignment using BLASTp and PDB.
Template recognition & selection involved searching the PDB for homologous proteins with
determined structures. The search was performed using simple sequence alignment programs such
as BLAST and FASTA as the percentage identity between the Target sequence and a possible
template is high enough in the safe zone, to be detected with these programs. In general, 40%
sequence identity is required to generate an useful model. Here, in the second step the sequence of
dACE2 with FASTA format has put into the BLAST and search out for the PDB.
Figure: FASTA FORMAT of query sequence with unknown structure.
7 | P a g e
Figure: Performing BLASTp from BLAST website.
Figure: The results of the BLASTp showing multiple outputs.
8 | P a g e
Table: The selected four template structures according to the lowest e-value, greater query
coverage and greater percent identity. The description, scientific name, maximum score,
total score, query coverage, e-value, percentage identity, accession length, and its PDB
accession has been given.
Description Scientific
Name
Max
Score
Total
Score
Query
Cover
E
value
Per.
Ident
Acc.
Len
Acce..
The 2019-nCoV
RBD/ACE2-B0AT1
complex
Homo
sapiens
942 942 99% 0.0 98.47% 814 6M17_B
S protein of SARS-
CoV-2 in complex
bound with T-ACE2
Homo
sapiens
942 942 99% 0.0 98.47% 817 7CT5_D
Cryo-EM structure
of cat ACE2 and
SARS-CoV-2 RBD
Felis
catus
723 723 85% 0.0 86.55% 732 7C8D_A
SARS Spike
Glycoprotein -
human ACE2
complex, Stabilized
variant, all ACE2-
bound particles
Homo
sapiens
560 560 59% 0.0 96.35% 605 6CS2_D
9 | P a g e
3. Refinements of the structures taken from PDB using Chimera 1.15rc
Refinement of structure 1 using CHIMERA 1.15rc : 6M17
Figure: 6M17 A to V chain
Figure:6M17 Chain B
Renamed it as seq1 for sake of convenience its not necessary.
10 | P a g e
Refinement of structure 2 using CHIMERA 1.15rc : 7CT5
Figure:7CT5 A to Z chains
Figure:7CT5 chain D
Renamed it as seq2 for sake of convenience its not necessary.
11 | P a g e
Refinement of structure 3 using CHIMERA 1.15rc : 7C8D
Figure:7C8D A and B chains
Figure:7C8D chain A
Renamed it as seq3 for sake of convenience its not necessary.
12 | P a g e
Refinement of structure 4 using CHIMERA 1.15rc: 6CS2
Figure:6CS2 A to Z chains.
Figure:6CS2 chain D
Renamed it as seq4 for sake of convenience its not necessary.
13 | P a g e
PREPARATION OF THE FIVE SCRIPTS FROM MODELLER TUTORIAL WEBSITE:
MODELLER STEPS
Now we have our query sequence and also 3D templates are recognized, the next step is the
preparation of the five scripts for MODELLER from MODELLER Tutorial website
https://salilab.org/modeller/tutorial/basic.html
MODELLER
MODELLER is used for homology or comparative modeling of protein three-dimensional
structures. The user provides an alignment of a sequence to be modeled with known related
structures and MODELLER automatically calculates a model containing all non-hydrogen
atoms. MODELLER implements comparative protein structure modeling by satisfaction of
spatial restraints, and can perform many additional tasks, including de novo modeling of loops
in protein structures, optimization of various models of protein structure with respect to a
flexibly defined objective function, multiple alignment of protein sequences and/or structures,
clustering, searching of sequence databases, comparison of protein structures, etc. that are
shown below when explaining modelling steps.
Figure: Modeller program interface for scripts execution.
14 | P a g e
4. MODELLER Step 1: Script_1 preparation to analyze the query sequence and
maintain profile.
The first line contains the sequence code, in the format ">P1;code". The second line with ten fields
separated by colons generally contains information about the structure file. Only two of these fields
are used for sequences, "sequence" (indicating that the file contains a sequence without known
structure) and "TvLDH" (the model file name). The rest of the file contains the sequence of
TvLDH, with "*" marking its end.
>P1;TvLDH
sequence:TvLDH:::::::0.00: 0.00
Here placed our Query Sequence*
Figure: Query sequence save it as .ALI file with proper formatting.
15 | P a g e
Here in script 1 form MODELLER website, no need to change anything just save it as .PY
file.
from modeller import *
log.verbose()
env = environ()
#-- Prepare the input files
#-- Read in the sequence database
sdb = sequence_db(env)
sdb.read(seq_database_file='pdb_95.pir', seq_database_format='PIR',
chains_list='ALL', minmax_db_seq_len=(30, 4000), clean_sequences=True)
#-- Write the sequence database in binary form
sdb.write(seq_database_file='pdb_95.bin', seq_database_format='BINARY',
chains_list='ALL')
#-- Now, read in the binary database
sdb.read(seq_database_file='pdb_95.bin', seq_database_format='BINARY',
chains_list='ALL')
#-- Read in the target sequence/alignment
aln = alignment(env)
aln.append(file='TvLDH.ali', alignment_format='PIR', align_codes='ALL')
#-- Convert the input sequence/alignment into
# profile format
prf = aln.to_profile()
16 | P a g e
#-- Scan sequence database to pick up homologous sequences
prf.build(sdb, matrix_offset=-450, rr_file='${LIB}/blosum62.sim.mat',
gap_penalties_1d=(-500, -50), n_prof_iterations=1,
check_profile=False, max_aln_evalue=0.01)
#-- Write out the profile in text format
prf.write(file='build_profile.prf', profile_format='TEXT')
#-- Convert the profile back to alignment format
aln = prf.to_alignment()
#-- Write out the alignment file
aln.write(file='build_profile.ali', alignment_format='PIR')
Figure: Script 1; save it as PY file.
17 | P a g e
5. MODELLER Step 2: Script_2 preparation to carry out MULTIPLE
SEQUENCE ALIGNMENT and PHYLOGENETOC TREE construction and
check out the crystallographic resolution.
Here in the script2 replaced the name of our four pdb structures that were named as seq1, seq2,
seq3, and seq4 along with the chain name i.e B, D, A and D.
Note: If you have more structures or less structures you can add or delete structures according to
the choice.
from modeller import *
env = environ()
aln = alignment(env)
for (pdb, chain) in (('1b8p', 'A'), ('1bdm', 'A'), ('1civ', 'A'),
('5mdh', 'A'), ('7mdh', 'A'), ('1smk', 'A')):
m = model(env, file=pdb, model_segment=('FIRST:'+chain, 'LAST:'+chain))
aln.append_model(m, atom_files=pdb, align_codes=pdb+chain)
aln.malign()
aln.malign3d()
aln.compare_structures()
aln.id_table(matrix_file='family.mat')
env.dendrogram(matrix_file='family.mat', cluster_cut=-1.0)
18 | P a g e
Figure: Script 2; save it as PY file.
19 | P a g e
--------Run Script1 and script2--------
Now place the script1, script2 along with query file, PIR file and four pdb structure in modeler
folder here I placed in bin as shown below,
Now run modeler
20 | P a g e
Figure: For script 1 it will generate additional files
Figure: Additional files Pdb.95.bin, build.profile, script1.log
21 | P a g e
Run script2 now
Open script2 file and checkout the scores of MSA and Phylogenetic Tree.
Figure: Here it performs MSA and on the basis of MSA phylogenetic tree has constructed.
Now here we pick one template on the basis of crystallography resolution: seq1B @2.9 has
chosen due to its low crystallographic value.
22 | P a g e
6. MODELLER Step 3: Script_3 preparation for pairwise alignment using dynamic
programing for aligning the best one template with query.
In the previous step, it takes into account structural information from the template when
constructing an alignment. This task is achieved through a variable gap penalty function that tends
to place gaps in solvent exposed and curved regions, outside secondary structure segments, and
between two positions that are close in space. As a result, the alignment errors are reduced by
approximately one third relative to those that occur with standard sequence alignment techniques.
This improvement becomes more important as the similarity between the sequences decreases and
the number of gaps increases.
Here just place the template that we chose i.e., seq1 that is chosen in the MODELLER step2.
from modeller import *
env = environ()
aln = alignment(env)
mdl = model(env, file='1bdm', model_segment=('FIRST:A','LAST:A'))
aln.append_model(mdl, align_codes='1bdmA', atom_files='1bdm.pdb')
aln.append(file='TvLDH.ali', align_codes='TvLDH')
aln.align2d()
aln.write(file='TvLDH-1bdmA.ali', alignment_format='PIR')
aln.write(file='TvLDH-1bdmA.pap', alignment_format='PAP')
23 | P a g e
Now Run script3 to get the pairwise alignment of best template.
This is now pairwise alignment that will help to build our models conserved regions, it is
dynamic programing and exhaustive algorithm it will take time.
24 | P a g e
Script 3 output file
Figure: Now pairwise alignment has been done which is the necessary step in the model
building Time:172.75
25 | P a g e
7. MODELLER Step 4: Script_4 preparation for Model Building and Backbone R-
chain.
Once a target-template alignment is constructed, MODELLER calculates a 3D model of the target
completely automatically, using its automodel class. The following script will generate ten similar
models of our protein based on the seq1:
from modeller import *
from modeller.automodel import *
#from modeller import soap_protein_od
env = environ()
a = automodel(env, alnfile='TvLDH-1bdmA.ali',
knowns='1bdmA', sequence='TvLDH',
assess_methods=(assess.DOPE,
#soap_protein_od.Scorer(),
assess.GA341))
a.starting_model = 1
a.ending_model = 5
a.make()
26 | P a g e
Figure: Here I need 10 models so I choose 10 and replace pdb name.
Figure: Now run the MODELLER for script_4.
27 | P a g e
Figure: Running script4 generating models for us.
Figure: Now our ten models has been successfully generated.
28 | P a g e
Open script4 output file
Here according to DOPE score and molpdf value I chose one of the best model for our query
protein.
Several models are calculated for the same target, the "best" model can be selected in several ways.
For example, you could pick the model with the lowest value of the MODELLER objective
function or the DOPE or SOAP assessment scores, or with the highest GA341 assessment score,
which are reported at the end of the log file, above.
TvLDH.B99990010.pdb 3124.62549 -35491.57422 0.77961
29 | P a g e
>> Summary of successfully produced models:
Filename molpdf DOPE score GA341 score
----------------------------------------------------------------------
TvLDH.B99990001.pdb 3195.83813 -33944.30859 0.75628
TvLDH.B99990002.pdb 3074.42725 -34495.23438 0.66094
TvLDH.B99990003.pdb 2914.32275 -34914.96875 0.62078
TvLDH.B99990004.pdb 3244.13867 -35109.01563 0.79621
TvLDH.B99990005.pdb 3072.41846 -34744.25781 0.94353
TvLDH.B99990006.pdb 2985.87280 -34632.87891 0.80809
TvLDH.B99990007.pdb 3338.26465 -35036.42578 0.78566
TvLDH.B99990008.pdb 3178.71118 -34787.64063 0.54706
TvLDH.B99990009.pdb 3354.25049 -34837.94922 0.72689
TvLDH.B99990010.pdb 3124.62549 -35491.57422 0.77961
Total CPU time [seconds] : 325.75
Now this model TvLDH.B99990010.pdb has chosen according to its low molpdf values and DOPE
score.
30 | P a g e
8. MODELLER Step 5: Script_5 preparation for Model optimization
Before any external evaluation of the model, one should check and restraint violations. The file
"evaluate_model.py" here named as script-5 evaluates an input model with the DOPE potential.
Note that here we TvLDH.B99990010.pdb picked the tenth generated model
from modeller import *
from modeller.scripts import complete_pdb
log.verbose() # request verbose output
env = environ()
env.libs.topology.read(file='$(LIB)/top_heav.lib') # read
topology
env.libs.parameters.read(file='$(LIB)/par.lib') # read
parameters
# read model file
mdl = complete_pdb(env, 'TvLDH.B99990002.pdb')
# Assess with DOPE:
s = selection(mdl) # all atom selection
s.assess_dope(output='ENERGY_PROFILE NO_REPORT',
file='TvLDH.profile',
normalize_profile=True, smoothing_window=15)
Figure: Preparing and Running the script5.
31 | P a g e
Figure: Running the script5. It will optimize our model TvLDH.B99990010.pdb
Note: Loop_2 was also done to check if further best models can be generated but the results
that we find out in loop_1 was more acceptable as compared to the loop_2 model that was
later analyzed by Ramachandran Plot.
Figure: Representation of Loop_2 but models we get from Loop_2 (10 disallowed regions)
was not very much authenticated by Ramachandran as of Loop_1.
32 | P a g e
9. Validation and structural organization of 3D Model using Ramachandran Plot
and Chimera.
Our generated Best Model open using Chimera1.15rc and visualize using PYMOL
Figure: Open using chimera TvLDH.B99990010.pdb
Figure: Shows the 3D Structural Organization of protein with number of turns,coils and beta
strands.
33 | P a g e
Figure: Represents the electrostatic potential protein contact. Here red represents the acidic,
blue represents basic and grey represents the neutral part of the protein.
Figure: Surface model that represents C-green, H-grey, N-blue, O-red, S-orange.
34 | P a g e
Figure: Labelled 3D model with the residues and its main chain atomic structure.
35 | P a g e
VALIDATION USING RAMACHANDRAN PLOT
In biochemistry, a Ramachandran plot (also known as a Rama plot, a Ramachandran
diagram or a [φ,ψ] plot), originally developed in 1963 by G. N. Ramachandran, C.
Ramakrishnan, and V. Sasisekharan, is a way to visualize energetically allowed regions for
backbone dihedral angles ψ against φ of amino acid residues in protein structure.
PROCHECK JOB TITLE: https://saves.mbi.ucla.edu/?job=602392
Figure: Graphical representation of the 3D structure of predicted model for dACE2
sequence. A Ramachandran plot generated, a protein that contains both β-sheet and α-helix
and randomn coils. The red, brown, and yellow regions represent the favored, allowed, and
"generously allowed" regions as defined by ProCheck.
Figure: The Plot statistics generated by PROCHECK shows its several characteristics.
36 | P a g e
On the basis of amino acids stereochemistry
Figure: On the basis of aminoacid steriochemistry different residues are shown.
On the basis of statistics
Figure: On the basis of statistics of each residue involved.
37 | P a g e
On the basis of residues properties
Figure: Shows absolute deviation from mean Chi-1 value, omega torsion, C-alpha chirality,
secondary structure and G-factor of the protein with sequence length.
38 | P a g e
Figure: Shows absolute deviation from mean Chi-1 value, omega torsion, C-alpha chirality,
secondary structure and G-factor of the protein with sequence length.
39 | P a g e
Figure: Shows absolute deviation from mean Chi-1 value, omega torsion, C-alpha chirality,
secondary structure and G-factor of the protein with sequence length.
40 | P a g e
Figure: Shows absolute deviation from mean Chi-1 value, omega torsion, C-alpha chirality,
secondary structure and G-factor of the protein with sequence length.
41 | P a g e
Figure: Shows absolute deviation from mean Chi-1 value, omega torsion, C-alpha chirality,
secondary structure and G-factor of the protein with sequence length.
From here also we can estimate the no of helix 22, 27 random coils and 4 beta sheet strands.
42 | P a g e
Functions of the Protein in the literature
The sequence of dACE2 protein has published in July 20,2020 but its functions are not mentioned
in the UniProt or any other source except its published paper. Identified a novel, primate-specific
isoform of ACE2, which we designate as deltaACE2 (dACE2). They showed that dACE2, but
not ACE2, is induced in various human cell types by IFNs and viruses; this information is
important to consider for future therapeutic strategies and understanding susceptibility and
outcomes of COVID-19.
1. dACE2 is a novel inducible and primate-specific isoform of ACE2- The
novel ACE2 isoform at 5p22.2 locus of human chromosome X is predicted to encode a
protein of 459 aa, in which Ex1c encodes the first 10 aa, which are unique. Compared to
the full-length ACE2 protein of 805 aa, the truncation eliminates 17 aa of the signal peptide
and 339 aa of the N-terminal peptidase domain as shown in figure 1.
Figure 1: designation novel inducible isoform.
2. dACE2 is induced by IFNs in vitro.- In most cell lines tested, dACE2 but not ACE2 was
strongly upregulated by SeV infection (Figure 2B,C). Treatments with IFN-β or a cocktail
of IFNλ1–3 significantly induced only expression of dACE2 and not ACE2 (Figure 2E, ).
43 | P a g e
Figure 2: designated B, C and E.
3. dACE2 is induced in virally infected human respiratory epithelial cells- dACE2 but
not ACE2 was induced in RSV-infected human pulmonary carcinoma cell line (H292).
4. dACE2 is enriched in squamous epithelial tumors- They hypothesized that as an
ISG, dACE2 might be absent or expressed at low levels in normal tissues but could be
induced by the inflammatory tissue microenvironment. We explored the data from The
Cancer Genome Atlas (TCGA), which represents the largest collection of tumors and
tumor-adjacent normal tissues. Expression of both ACE2 and dACE2 was detectable in
many tumor-adjacent normal tissues.
5. dACE2 is induced by SARS-CoV-2 in vitro- These results confirm that dACE2 is
inducible by SARS-CoV-2 infection. Expression of ACE2 and dACE2 was much higher
in a lung adenocarcinoma cell line Calu3 compared to both colon adenocarcinoma cell lines
Caco2 and T84.
6. dACE2 is non-functional as SARS-CoV-2 receptor and carboxypeptidase- the main
activities that involve the peptidase domain of ACE2 appear to be abrogated in dACE2.
44 | P a g e
In conclusion, they present the first report of the discovery and functional annotation of dACE2,
an IFN-inducible isoform of ACE2. The existence of two functionally distinct ACE2 isoforms
reconciles several biological properties previously attributed to ACE2, with dACE2 being an ISG,
and ACE2 acting as the SARS-CoV-2 entry receptor and carboxypeptidase, without being
regulated by IFNs.
Major Contribution for disclosing dACE2- Laboratory of Translational Genomics, Division of
Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health,
Bethesda, MD, USA.
45 | P a g e
Conclusion
In this project assignment, I predict the three-dimensional structure through MODELLER
homology modelling of the dACE2 protein sequence that was disclosed by the Laboratory of
Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer
Institute, National Institutes of Health, Bethesda, MD, USA in JULY 2020. My predicted protein
structure shows 99.3% authenticity (87.7% most allowed region, 9.3% in allowed region and
2.2% in rigorously allowed region) according to the Ramachandran plot analysis defined by
PROCHECK. Also, the structural organization have shown the existence of helix, beta strands and
random coils.
46 | P a g e
References
1) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7386494/
2) https://www.uniprot.org/uniprot/A0A7D6JAD5
3) https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp&PAGE_TYPE=BlastSe
arch&LINK_LOC=blasthome
4) https://www.rcsb.org/structure/6M17#entity-2
5) https://www.rcsb.org/structure/7CT5
6) https://www.rcsb.org/structure/7C8D
7) https://www.rcsb.org/structure/6CS2
8) https://salilab.org/modeller/
9) http://services.mbi.ucla.edu/SAVES/Ramachandran/
10) Tools: chimera 1.15rc, MODELLER 9.25, PyMOL.

More Related Content

Similar to Zarlish attique 187104 project assignment modeller

modelling assignment
modelling assignmentmodelling assignment
modelling assignment
ShwetA Kumari
 
Bhageerath h
Bhageerath  h Bhageerath  h
Bhageerath h
prateek kumar
 
Prediction of pIC50 Values for the Acetylcholinesterase (AChE) using QSAR Model
Prediction of pIC50 Values for the Acetylcholinesterase (AChE) using QSAR ModelPrediction of pIC50 Values for the Acetylcholinesterase (AChE) using QSAR Model
Prediction of pIC50 Values for the Acetylcholinesterase (AChE) using QSAR Model
IRJET Journal
 
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?
Sunghwan Kim
 
PheWAS-package.pdf
PheWAS-package.pdfPheWAS-package.pdf
PheWAS-package.pdf
suryakantsuryakant
 
Protein Modeling Overview
Protein Modeling OverviewProtein Modeling Overview
Protein Modeling Overview
Syed Lokman
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
IAEME Publication
 
Stil test pattern generation enhancement in mixed signal design
Stil test pattern generation enhancement in mixed signal designStil test pattern generation enhancement in mixed signal design
Stil test pattern generation enhancement in mixed signal design
Conference Papers
 
Protocol Type Based Intrusion Detection Using RBF Neural Network
Protocol Type Based Intrusion Detection Using RBF Neural NetworkProtocol Type Based Intrusion Detection Using RBF Neural Network
Protocol Type Based Intrusion Detection Using RBF Neural Network
Waqas Tariq
 
3d structure prediction of RGS9 gene
3d structure prediction of RGS9 gene3d structure prediction of RGS9 gene
3d structure prediction of RGS9 gene
Rida Khalid
 
IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...
IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...
IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...
IRJET Journal
 
Recent trends in bioinformatics
Recent trends in bioinformaticsRecent trends in bioinformatics
Recent trends in bioinformatics
Zeeshan Hanjra
 
Heart Disease Prediction using Machine Learning
Heart Disease Prediction using Machine LearningHeart Disease Prediction using Machine Learning
Heart Disease Prediction using Machine Learning
IRJET Journal
 
Lab5_NguyenAlbert
Lab5_NguyenAlbertLab5_NguyenAlbert
Lab5_NguyenAlbert
Albert Nguyen
 
EnVISION2, quick tutorial. EnCORE for Biologist
EnVISION2, quick tutorial. EnCORE for BiologistEnVISION2, quick tutorial. EnCORE for Biologist
EnVISION2, quick tutorial. EnCORE for Biologist
Rafael C. Jimenez
 
Use of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformaticsUse of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformatics
Remzi Çelebi
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...
IOSR Journals
 
ChIP-Sequencing
ChIP-Sequencing ChIP-Sequencing
ChIP-Sequencing
Hajra Qayyum
 
Chap-07-1.ppt
Chap-07-1.pptChap-07-1.ppt
Chap-07-1.ppt
ISHAAGARWAL75
 
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
cscpconf
 

Similar to Zarlish attique 187104 project assignment modeller (20)

modelling assignment
modelling assignmentmodelling assignment
modelling assignment
 
Bhageerath h
Bhageerath  h Bhageerath  h
Bhageerath h
 
Prediction of pIC50 Values for the Acetylcholinesterase (AChE) using QSAR Model
Prediction of pIC50 Values for the Acetylcholinesterase (AChE) using QSAR ModelPrediction of pIC50 Values for the Acetylcholinesterase (AChE) using QSAR Model
Prediction of pIC50 Values for the Acetylcholinesterase (AChE) using QSAR Model
 
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?
 
PheWAS-package.pdf
PheWAS-package.pdfPheWAS-package.pdf
PheWAS-package.pdf
 
Protein Modeling Overview
Protein Modeling OverviewProtein Modeling Overview
Protein Modeling Overview
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
 
Stil test pattern generation enhancement in mixed signal design
Stil test pattern generation enhancement in mixed signal designStil test pattern generation enhancement in mixed signal design
Stil test pattern generation enhancement in mixed signal design
 
Protocol Type Based Intrusion Detection Using RBF Neural Network
Protocol Type Based Intrusion Detection Using RBF Neural NetworkProtocol Type Based Intrusion Detection Using RBF Neural Network
Protocol Type Based Intrusion Detection Using RBF Neural Network
 
3d structure prediction of RGS9 gene
3d structure prediction of RGS9 gene3d structure prediction of RGS9 gene
3d structure prediction of RGS9 gene
 
IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...
IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...
IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...
 
Recent trends in bioinformatics
Recent trends in bioinformaticsRecent trends in bioinformatics
Recent trends in bioinformatics
 
Heart Disease Prediction using Machine Learning
Heart Disease Prediction using Machine LearningHeart Disease Prediction using Machine Learning
Heart Disease Prediction using Machine Learning
 
Lab5_NguyenAlbert
Lab5_NguyenAlbertLab5_NguyenAlbert
Lab5_NguyenAlbert
 
EnVISION2, quick tutorial. EnCORE for Biologist
EnVISION2, quick tutorial. EnCORE for BiologistEnVISION2, quick tutorial. EnCORE for Biologist
EnVISION2, quick tutorial. EnCORE for Biologist
 
Use of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformaticsUse of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformatics
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...
 
ChIP-Sequencing
ChIP-Sequencing ChIP-Sequencing
ChIP-Sequencing
 
Chap-07-1.ppt
Chap-07-1.pptChap-07-1.ppt
Chap-07-1.ppt
 
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
 

More from ZarlishAttique1

Automated and manual Primer designing and its validation using Bioinformatics...
Automated and manual Primer designing and its validation using Bioinformatics...Automated and manual Primer designing and its validation using Bioinformatics...
Automated and manual Primer designing and its validation using Bioinformatics...
ZarlishAttique1
 
Phylogenetic tree construction using bioinformatics tools Zarlish attique 187104
Phylogenetic tree construction using bioinformatics tools Zarlish attique 187104Phylogenetic tree construction using bioinformatics tools Zarlish attique 187104
Phylogenetic tree construction using bioinformatics tools Zarlish attique 187104
ZarlishAttique1
 
Genome sequencing and the development of our current information library
Genome sequencing and the development of our current information libraryGenome sequencing and the development of our current information library
Genome sequencing and the development of our current information library
ZarlishAttique1
 
DBMS Helping material
DBMS Helping materialDBMS Helping material
DBMS Helping material
ZarlishAttique1
 
QSAR quantitative structure activity relationship
QSAR quantitative structure activity relationship QSAR quantitative structure activity relationship
QSAR quantitative structure activity relationship
ZarlishAttique1
 
Receptor Effector coupling by G-Proteins Zarlish attique 187104
Receptor Effector coupling by G-Proteins Zarlish attique 187104 Receptor Effector coupling by G-Proteins Zarlish attique 187104
Receptor Effector coupling by G-Proteins Zarlish attique 187104
ZarlishAttique1
 
Computational phylogenetics theoretical concepts, methods with practical on C...
Computational phylogenetics theoretical concepts, methods with practical on C...Computational phylogenetics theoretical concepts, methods with practical on C...
Computational phylogenetics theoretical concepts, methods with practical on C...
ZarlishAttique1
 

More from ZarlishAttique1 (7)

Automated and manual Primer designing and its validation using Bioinformatics...
Automated and manual Primer designing and its validation using Bioinformatics...Automated and manual Primer designing and its validation using Bioinformatics...
Automated and manual Primer designing and its validation using Bioinformatics...
 
Phylogenetic tree construction using bioinformatics tools Zarlish attique 187104
Phylogenetic tree construction using bioinformatics tools Zarlish attique 187104Phylogenetic tree construction using bioinformatics tools Zarlish attique 187104
Phylogenetic tree construction using bioinformatics tools Zarlish attique 187104
 
Genome sequencing and the development of our current information library
Genome sequencing and the development of our current information libraryGenome sequencing and the development of our current information library
Genome sequencing and the development of our current information library
 
DBMS Helping material
DBMS Helping materialDBMS Helping material
DBMS Helping material
 
QSAR quantitative structure activity relationship
QSAR quantitative structure activity relationship QSAR quantitative structure activity relationship
QSAR quantitative structure activity relationship
 
Receptor Effector coupling by G-Proteins Zarlish attique 187104
Receptor Effector coupling by G-Proteins Zarlish attique 187104 Receptor Effector coupling by G-Proteins Zarlish attique 187104
Receptor Effector coupling by G-Proteins Zarlish attique 187104
 
Computational phylogenetics theoretical concepts, methods with practical on C...
Computational phylogenetics theoretical concepts, methods with practical on C...Computational phylogenetics theoretical concepts, methods with practical on C...
Computational phylogenetics theoretical concepts, methods with practical on C...
 

Recently uploaded

The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
RASHMI M G
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
European Sustainable Phosphorus Platform
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 

Recently uploaded (20)

The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 

Zarlish attique 187104 project assignment modeller

  • 1. Subject: Pharmacoinformatics Government Post Graduate College Mandian Abbottabad Assignment no 2: PROJECT ASSIGNMENT MODELLER Submitted by: Name: Zarlish Attique Registration no: 187104 Subject: Pharmacoinformatics Department: Bioinformatics Semester: 5th Submitted to: Teacher Name: Sir Muhammad Imran Sharif Department of Bioinformatics Date of Submission: January 7,2020 PROJECT QUESTIONS: - 1. Take Any protein sequence (make sure that the 3D structure is not present in the PDB database), predict the structure by using MODELLER. 2. Write down functions of the protein, structural organization (no. of beta sheets, helices etc). 3. Write methodology and results of modeling procedure.
  • 2. 2 | P a g e Homology Modeling Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein.
  • 3. 3 | P a g e ABOUT PROTEIN: dACE2 Truncated angiotensin converting enzyme 2 Primate-specific isoform of ACE2 Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), which causes COVID19, utilizes angiotensin-converting enzyme 2 (ACE2) for entry into target cells. ACE2 has been proposed as an interferon-stimulated gene (ISG). Thus, interferon-induced variability in ACE2 expression levels could be important for susceptibility to COVID-19 or its outcomes. The discovery of a novel, primate-specific isoform of ACE2 has been reported, which is designate as deltaACE2 (dACE2). Demonstrate that dACE2, but not ACE2, is an ISG. In vitro, dACE2, which lacks 356 N-terminal amino acids, was non-functional in binding the SARS-CoV-2 spike protein and as a carboxypeptidase. Their results reconcile current knowledge on ACE2 expression and suggest that the ISG-type induction of dACE2 in IFN-high conditions created by treatments, inflammatory tumor microenvironment, or viral co-infections is unlikely to affect the cellular entry of SARS-CoV-2 and promote infection. An interferon-stimulated gene (ISG) is a gene whose expression is stimulated by interferon. Interferons (IFNs) are a group of signaling proteins made and released by host cells in response to the presence of several viruses. In a typical scenario, a virus-infected cell will release interferons causing nearby cells to heighten their anti-viral defenses.
  • 4. 4 | P a g e METHODOLOGY AND THE RESULTS OF PROTEIN STRUCTURE PREDICTION AND MODELLER 1. Protein selection from UniProt with no known structure. UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. In the first step, the protein with UniProtKB-A0A7D6JAD5_HUMAN has been selected for the study. Figure: Till 2-jan-2020 the structure is unknown not present in pdb as well. NCBI Nucleotide: https://www.ncbi.nlm.nih.gov/nucleotide/MT505392 NCBI Protein: https://www.ncbi.nlm.nih.gov/protein/1878857681 NCBI Taxonomy: https://www.ncbi.nlm.nih.gov/taxonomy/?term=9606 24-JUL-2020
  • 5. 5 | P a g e Table: Entry information from UniProt. Entry name A0A7D6JAD5_HUMAN Accession Primary (citable) accession number: A0A7D6JAD5 Entry history Integrated into UniProtKB/TrEMBL: December 2, 2020 Last sequence update: December 2, 2020 Last modified: December 2, 2020 Entry status Unreviewed (UniProtKB/TrEMBL) FASTA Sequence A0A7D6JAD5_HUMAN taken from UniProt. Length:459 Mass (Da):52,737 >tr|A0A7D6JAD5|A0A7D6JAD5_HUMAN Truncated angiotensin converting enzyme 2 OS=Homo sapiens OX=9606 GN=ACE2 PE=2 SV=1 MREAGWDKGGRILMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHE AVGE IMSLSAATPKHLKSIGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKG EIPKDQWMKKWWEMKREIVGVVEPVPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQ FQ EALCQAAKHEGPLHKCDISNSTEAGQKLFNMLRLGKSEPWTLALENVVGAKNMNVRPL LN YFEPLFTWLKDQNKNSFVGWSTDWSPYADQSIKVRISLKSALGDKAYEWNDNEMYLFR SS VAYAMRQYFLKVKNQMILFGEEDVRVANLKPRISFNFFVTAPKNVSDIIPRTEVEKAIRM SRSRINDAFRLNDNSLEFLGIQPTLGPPNQPPVSIWLIVFGVVMGVIVVGIVILIFTGIR DRKKKNKARSGENPYASIDISKGENNPGFQNTDDVQTSF
  • 6. 6 | P a g e 2. Template recognition and initial alignment using BLASTp and PDB. Template recognition & selection involved searching the PDB for homologous proteins with determined structures. The search was performed using simple sequence alignment programs such as BLAST and FASTA as the percentage identity between the Target sequence and a possible template is high enough in the safe zone, to be detected with these programs. In general, 40% sequence identity is required to generate an useful model. Here, in the second step the sequence of dACE2 with FASTA format has put into the BLAST and search out for the PDB. Figure: FASTA FORMAT of query sequence with unknown structure.
  • 7. 7 | P a g e Figure: Performing BLASTp from BLAST website. Figure: The results of the BLASTp showing multiple outputs.
  • 8. 8 | P a g e Table: The selected four template structures according to the lowest e-value, greater query coverage and greater percent identity. The description, scientific name, maximum score, total score, query coverage, e-value, percentage identity, accession length, and its PDB accession has been given. Description Scientific Name Max Score Total Score Query Cover E value Per. Ident Acc. Len Acce.. The 2019-nCoV RBD/ACE2-B0AT1 complex Homo sapiens 942 942 99% 0.0 98.47% 814 6M17_B S protein of SARS- CoV-2 in complex bound with T-ACE2 Homo sapiens 942 942 99% 0.0 98.47% 817 7CT5_D Cryo-EM structure of cat ACE2 and SARS-CoV-2 RBD Felis catus 723 723 85% 0.0 86.55% 732 7C8D_A SARS Spike Glycoprotein - human ACE2 complex, Stabilized variant, all ACE2- bound particles Homo sapiens 560 560 59% 0.0 96.35% 605 6CS2_D
  • 9. 9 | P a g e 3. Refinements of the structures taken from PDB using Chimera 1.15rc Refinement of structure 1 using CHIMERA 1.15rc : 6M17 Figure: 6M17 A to V chain Figure:6M17 Chain B Renamed it as seq1 for sake of convenience its not necessary.
  • 10. 10 | P a g e Refinement of structure 2 using CHIMERA 1.15rc : 7CT5 Figure:7CT5 A to Z chains Figure:7CT5 chain D Renamed it as seq2 for sake of convenience its not necessary.
  • 11. 11 | P a g e Refinement of structure 3 using CHIMERA 1.15rc : 7C8D Figure:7C8D A and B chains Figure:7C8D chain A Renamed it as seq3 for sake of convenience its not necessary.
  • 12. 12 | P a g e Refinement of structure 4 using CHIMERA 1.15rc: 6CS2 Figure:6CS2 A to Z chains. Figure:6CS2 chain D Renamed it as seq4 for sake of convenience its not necessary.
  • 13. 13 | P a g e PREPARATION OF THE FIVE SCRIPTS FROM MODELLER TUTORIAL WEBSITE: MODELLER STEPS Now we have our query sequence and also 3D templates are recognized, the next step is the preparation of the five scripts for MODELLER from MODELLER Tutorial website https://salilab.org/modeller/tutorial/basic.html MODELLER MODELLER is used for homology or comparative modeling of protein three-dimensional structures. The user provides an alignment of a sequence to be modeled with known related structures and MODELLER automatically calculates a model containing all non-hydrogen atoms. MODELLER implements comparative protein structure modeling by satisfaction of spatial restraints, and can perform many additional tasks, including de novo modeling of loops in protein structures, optimization of various models of protein structure with respect to a flexibly defined objective function, multiple alignment of protein sequences and/or structures, clustering, searching of sequence databases, comparison of protein structures, etc. that are shown below when explaining modelling steps. Figure: Modeller program interface for scripts execution.
  • 14. 14 | P a g e 4. MODELLER Step 1: Script_1 preparation to analyze the query sequence and maintain profile. The first line contains the sequence code, in the format ">P1;code". The second line with ten fields separated by colons generally contains information about the structure file. Only two of these fields are used for sequences, "sequence" (indicating that the file contains a sequence without known structure) and "TvLDH" (the model file name). The rest of the file contains the sequence of TvLDH, with "*" marking its end. >P1;TvLDH sequence:TvLDH:::::::0.00: 0.00 Here placed our Query Sequence* Figure: Query sequence save it as .ALI file with proper formatting.
  • 15. 15 | P a g e Here in script 1 form MODELLER website, no need to change anything just save it as .PY file. from modeller import * log.verbose() env = environ() #-- Prepare the input files #-- Read in the sequence database sdb = sequence_db(env) sdb.read(seq_database_file='pdb_95.pir', seq_database_format='PIR', chains_list='ALL', minmax_db_seq_len=(30, 4000), clean_sequences=True) #-- Write the sequence database in binary form sdb.write(seq_database_file='pdb_95.bin', seq_database_format='BINARY', chains_list='ALL') #-- Now, read in the binary database sdb.read(seq_database_file='pdb_95.bin', seq_database_format='BINARY', chains_list='ALL') #-- Read in the target sequence/alignment aln = alignment(env) aln.append(file='TvLDH.ali', alignment_format='PIR', align_codes='ALL') #-- Convert the input sequence/alignment into # profile format prf = aln.to_profile()
  • 16. 16 | P a g e #-- Scan sequence database to pick up homologous sequences prf.build(sdb, matrix_offset=-450, rr_file='${LIB}/blosum62.sim.mat', gap_penalties_1d=(-500, -50), n_prof_iterations=1, check_profile=False, max_aln_evalue=0.01) #-- Write out the profile in text format prf.write(file='build_profile.prf', profile_format='TEXT') #-- Convert the profile back to alignment format aln = prf.to_alignment() #-- Write out the alignment file aln.write(file='build_profile.ali', alignment_format='PIR') Figure: Script 1; save it as PY file.
  • 17. 17 | P a g e 5. MODELLER Step 2: Script_2 preparation to carry out MULTIPLE SEQUENCE ALIGNMENT and PHYLOGENETOC TREE construction and check out the crystallographic resolution. Here in the script2 replaced the name of our four pdb structures that were named as seq1, seq2, seq3, and seq4 along with the chain name i.e B, D, A and D. Note: If you have more structures or less structures you can add or delete structures according to the choice. from modeller import * env = environ() aln = alignment(env) for (pdb, chain) in (('1b8p', 'A'), ('1bdm', 'A'), ('1civ', 'A'), ('5mdh', 'A'), ('7mdh', 'A'), ('1smk', 'A')): m = model(env, file=pdb, model_segment=('FIRST:'+chain, 'LAST:'+chain)) aln.append_model(m, atom_files=pdb, align_codes=pdb+chain) aln.malign() aln.malign3d() aln.compare_structures() aln.id_table(matrix_file='family.mat') env.dendrogram(matrix_file='family.mat', cluster_cut=-1.0)
  • 18. 18 | P a g e Figure: Script 2; save it as PY file.
  • 19. 19 | P a g e --------Run Script1 and script2-------- Now place the script1, script2 along with query file, PIR file and four pdb structure in modeler folder here I placed in bin as shown below, Now run modeler
  • 20. 20 | P a g e Figure: For script 1 it will generate additional files Figure: Additional files Pdb.95.bin, build.profile, script1.log
  • 21. 21 | P a g e Run script2 now Open script2 file and checkout the scores of MSA and Phylogenetic Tree. Figure: Here it performs MSA and on the basis of MSA phylogenetic tree has constructed. Now here we pick one template on the basis of crystallography resolution: seq1B @2.9 has chosen due to its low crystallographic value.
  • 22. 22 | P a g e 6. MODELLER Step 3: Script_3 preparation for pairwise alignment using dynamic programing for aligning the best one template with query. In the previous step, it takes into account structural information from the template when constructing an alignment. This task is achieved through a variable gap penalty function that tends to place gaps in solvent exposed and curved regions, outside secondary structure segments, and between two positions that are close in space. As a result, the alignment errors are reduced by approximately one third relative to those that occur with standard sequence alignment techniques. This improvement becomes more important as the similarity between the sequences decreases and the number of gaps increases. Here just place the template that we chose i.e., seq1 that is chosen in the MODELLER step2. from modeller import * env = environ() aln = alignment(env) mdl = model(env, file='1bdm', model_segment=('FIRST:A','LAST:A')) aln.append_model(mdl, align_codes='1bdmA', atom_files='1bdm.pdb') aln.append(file='TvLDH.ali', align_codes='TvLDH') aln.align2d() aln.write(file='TvLDH-1bdmA.ali', alignment_format='PIR') aln.write(file='TvLDH-1bdmA.pap', alignment_format='PAP')
  • 23. 23 | P a g e Now Run script3 to get the pairwise alignment of best template. This is now pairwise alignment that will help to build our models conserved regions, it is dynamic programing and exhaustive algorithm it will take time.
  • 24. 24 | P a g e Script 3 output file Figure: Now pairwise alignment has been done which is the necessary step in the model building Time:172.75
  • 25. 25 | P a g e 7. MODELLER Step 4: Script_4 preparation for Model Building and Backbone R- chain. Once a target-template alignment is constructed, MODELLER calculates a 3D model of the target completely automatically, using its automodel class. The following script will generate ten similar models of our protein based on the seq1: from modeller import * from modeller.automodel import * #from modeller import soap_protein_od env = environ() a = automodel(env, alnfile='TvLDH-1bdmA.ali', knowns='1bdmA', sequence='TvLDH', assess_methods=(assess.DOPE, #soap_protein_od.Scorer(), assess.GA341)) a.starting_model = 1 a.ending_model = 5 a.make()
  • 26. 26 | P a g e Figure: Here I need 10 models so I choose 10 and replace pdb name. Figure: Now run the MODELLER for script_4.
  • 27. 27 | P a g e Figure: Running script4 generating models for us. Figure: Now our ten models has been successfully generated.
  • 28. 28 | P a g e Open script4 output file Here according to DOPE score and molpdf value I chose one of the best model for our query protein. Several models are calculated for the same target, the "best" model can be selected in several ways. For example, you could pick the model with the lowest value of the MODELLER objective function or the DOPE or SOAP assessment scores, or with the highest GA341 assessment score, which are reported at the end of the log file, above. TvLDH.B99990010.pdb 3124.62549 -35491.57422 0.77961
  • 29. 29 | P a g e >> Summary of successfully produced models: Filename molpdf DOPE score GA341 score ---------------------------------------------------------------------- TvLDH.B99990001.pdb 3195.83813 -33944.30859 0.75628 TvLDH.B99990002.pdb 3074.42725 -34495.23438 0.66094 TvLDH.B99990003.pdb 2914.32275 -34914.96875 0.62078 TvLDH.B99990004.pdb 3244.13867 -35109.01563 0.79621 TvLDH.B99990005.pdb 3072.41846 -34744.25781 0.94353 TvLDH.B99990006.pdb 2985.87280 -34632.87891 0.80809 TvLDH.B99990007.pdb 3338.26465 -35036.42578 0.78566 TvLDH.B99990008.pdb 3178.71118 -34787.64063 0.54706 TvLDH.B99990009.pdb 3354.25049 -34837.94922 0.72689 TvLDH.B99990010.pdb 3124.62549 -35491.57422 0.77961 Total CPU time [seconds] : 325.75 Now this model TvLDH.B99990010.pdb has chosen according to its low molpdf values and DOPE score.
  • 30. 30 | P a g e 8. MODELLER Step 5: Script_5 preparation for Model optimization Before any external evaluation of the model, one should check and restraint violations. The file "evaluate_model.py" here named as script-5 evaluates an input model with the DOPE potential. Note that here we TvLDH.B99990010.pdb picked the tenth generated model from modeller import * from modeller.scripts import complete_pdb log.verbose() # request verbose output env = environ() env.libs.topology.read(file='$(LIB)/top_heav.lib') # read topology env.libs.parameters.read(file='$(LIB)/par.lib') # read parameters # read model file mdl = complete_pdb(env, 'TvLDH.B99990002.pdb') # Assess with DOPE: s = selection(mdl) # all atom selection s.assess_dope(output='ENERGY_PROFILE NO_REPORT', file='TvLDH.profile', normalize_profile=True, smoothing_window=15) Figure: Preparing and Running the script5.
  • 31. 31 | P a g e Figure: Running the script5. It will optimize our model TvLDH.B99990010.pdb Note: Loop_2 was also done to check if further best models can be generated but the results that we find out in loop_1 was more acceptable as compared to the loop_2 model that was later analyzed by Ramachandran Plot. Figure: Representation of Loop_2 but models we get from Loop_2 (10 disallowed regions) was not very much authenticated by Ramachandran as of Loop_1.
  • 32. 32 | P a g e 9. Validation and structural organization of 3D Model using Ramachandran Plot and Chimera. Our generated Best Model open using Chimera1.15rc and visualize using PYMOL Figure: Open using chimera TvLDH.B99990010.pdb Figure: Shows the 3D Structural Organization of protein with number of turns,coils and beta strands.
  • 33. 33 | P a g e Figure: Represents the electrostatic potential protein contact. Here red represents the acidic, blue represents basic and grey represents the neutral part of the protein. Figure: Surface model that represents C-green, H-grey, N-blue, O-red, S-orange.
  • 34. 34 | P a g e Figure: Labelled 3D model with the residues and its main chain atomic structure.
  • 35. 35 | P a g e VALIDATION USING RAMACHANDRAN PLOT In biochemistry, a Ramachandran plot (also known as a Rama plot, a Ramachandran diagram or a [φ,ψ] plot), originally developed in 1963 by G. N. Ramachandran, C. Ramakrishnan, and V. Sasisekharan, is a way to visualize energetically allowed regions for backbone dihedral angles ψ against φ of amino acid residues in protein structure. PROCHECK JOB TITLE: https://saves.mbi.ucla.edu/?job=602392 Figure: Graphical representation of the 3D structure of predicted model for dACE2 sequence. A Ramachandran plot generated, a protein that contains both β-sheet and α-helix and randomn coils. The red, brown, and yellow regions represent the favored, allowed, and "generously allowed" regions as defined by ProCheck. Figure: The Plot statistics generated by PROCHECK shows its several characteristics.
  • 36. 36 | P a g e On the basis of amino acids stereochemistry Figure: On the basis of aminoacid steriochemistry different residues are shown. On the basis of statistics Figure: On the basis of statistics of each residue involved.
  • 37. 37 | P a g e On the basis of residues properties Figure: Shows absolute deviation from mean Chi-1 value, omega torsion, C-alpha chirality, secondary structure and G-factor of the protein with sequence length.
  • 38. 38 | P a g e Figure: Shows absolute deviation from mean Chi-1 value, omega torsion, C-alpha chirality, secondary structure and G-factor of the protein with sequence length.
  • 39. 39 | P a g e Figure: Shows absolute deviation from mean Chi-1 value, omega torsion, C-alpha chirality, secondary structure and G-factor of the protein with sequence length.
  • 40. 40 | P a g e Figure: Shows absolute deviation from mean Chi-1 value, omega torsion, C-alpha chirality, secondary structure and G-factor of the protein with sequence length.
  • 41. 41 | P a g e Figure: Shows absolute deviation from mean Chi-1 value, omega torsion, C-alpha chirality, secondary structure and G-factor of the protein with sequence length. From here also we can estimate the no of helix 22, 27 random coils and 4 beta sheet strands.
  • 42. 42 | P a g e Functions of the Protein in the literature The sequence of dACE2 protein has published in July 20,2020 but its functions are not mentioned in the UniProt or any other source except its published paper. Identified a novel, primate-specific isoform of ACE2, which we designate as deltaACE2 (dACE2). They showed that dACE2, but not ACE2, is induced in various human cell types by IFNs and viruses; this information is important to consider for future therapeutic strategies and understanding susceptibility and outcomes of COVID-19. 1. dACE2 is a novel inducible and primate-specific isoform of ACE2- The novel ACE2 isoform at 5p22.2 locus of human chromosome X is predicted to encode a protein of 459 aa, in which Ex1c encodes the first 10 aa, which are unique. Compared to the full-length ACE2 protein of 805 aa, the truncation eliminates 17 aa of the signal peptide and 339 aa of the N-terminal peptidase domain as shown in figure 1. Figure 1: designation novel inducible isoform. 2. dACE2 is induced by IFNs in vitro.- In most cell lines tested, dACE2 but not ACE2 was strongly upregulated by SeV infection (Figure 2B,C). Treatments with IFN-β or a cocktail of IFNλ1–3 significantly induced only expression of dACE2 and not ACE2 (Figure 2E, ).
  • 43. 43 | P a g e Figure 2: designated B, C and E. 3. dACE2 is induced in virally infected human respiratory epithelial cells- dACE2 but not ACE2 was induced in RSV-infected human pulmonary carcinoma cell line (H292). 4. dACE2 is enriched in squamous epithelial tumors- They hypothesized that as an ISG, dACE2 might be absent or expressed at low levels in normal tissues but could be induced by the inflammatory tissue microenvironment. We explored the data from The Cancer Genome Atlas (TCGA), which represents the largest collection of tumors and tumor-adjacent normal tissues. Expression of both ACE2 and dACE2 was detectable in many tumor-adjacent normal tissues. 5. dACE2 is induced by SARS-CoV-2 in vitro- These results confirm that dACE2 is inducible by SARS-CoV-2 infection. Expression of ACE2 and dACE2 was much higher in a lung adenocarcinoma cell line Calu3 compared to both colon adenocarcinoma cell lines Caco2 and T84. 6. dACE2 is non-functional as SARS-CoV-2 receptor and carboxypeptidase- the main activities that involve the peptidase domain of ACE2 appear to be abrogated in dACE2.
  • 44. 44 | P a g e In conclusion, they present the first report of the discovery and functional annotation of dACE2, an IFN-inducible isoform of ACE2. The existence of two functionally distinct ACE2 isoforms reconciles several biological properties previously attributed to ACE2, with dACE2 being an ISG, and ACE2 acting as the SARS-CoV-2 entry receptor and carboxypeptidase, without being regulated by IFNs. Major Contribution for disclosing dACE2- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
  • 45. 45 | P a g e Conclusion In this project assignment, I predict the three-dimensional structure through MODELLER homology modelling of the dACE2 protein sequence that was disclosed by the Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA in JULY 2020. My predicted protein structure shows 99.3% authenticity (87.7% most allowed region, 9.3% in allowed region and 2.2% in rigorously allowed region) according to the Ramachandran plot analysis defined by PROCHECK. Also, the structural organization have shown the existence of helix, beta strands and random coils.
  • 46. 46 | P a g e References 1) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7386494/ 2) https://www.uniprot.org/uniprot/A0A7D6JAD5 3) https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp&PAGE_TYPE=BlastSe arch&LINK_LOC=blasthome 4) https://www.rcsb.org/structure/6M17#entity-2 5) https://www.rcsb.org/structure/7CT5 6) https://www.rcsb.org/structure/7C8D 7) https://www.rcsb.org/structure/6CS2 8) https://salilab.org/modeller/ 9) http://services.mbi.ucla.edu/SAVES/Ramachandran/ 10) Tools: chimera 1.15rc, MODELLER 9.25, PyMOL.