Modelling Assignment
Submitted to: Submitted by:
Dr. Durg Vijay Singh Shweta Kumari
Roll- 21
M.Sc. Bioinformatics
2Nd
sem
CONTENT

Objective

Structure prediction

Threading

Ab inito

Phyre2 and result

Dali str-str alignment and its result

Robetta and its result

Validation

Result and discussion

Conclusion

reference
OBJECTIVE
To build the model of given amino acid residue sequence and validate
the generated model.
>gi|407259499|gb|AFT91383.1| EcdL [Emericella rugulosa]
MDDSPWPQCDIRVQDTFGPQVSGCYEDFDFTLLFEESILYLPPLLIAASVALLRIWQL
RSTENLLKRSGLLSILKPTSTTRLSNAAIAIGFVASPIFAWLSFWEHARSLRPSTI
LNVYLLGTIPMDAARARTLFRMPGNSAIASIFATIVVCKVVLLVVEAMEKQRLLLD
RGWAPEETAGILNRSFLWWFNPLLLSGYKQALTVDKLLAVDEDIGVEKSKDEIRRR
WAQAVKQNASSLQDVLLAVYRTELWGGFLPRLCLIGVNYAQPFLVNRVVTFLGQPD
TSTSRGVASGLIAAYAIVYMGIAVATAAFHHRSYRMVMMVRGGLILLIYDHTLTLN
ALSPSKNDSYTLITADIERIVSGLRSLHETWASLIEIALSLWLLETKIRVSAVAAA
MVVLVCLLVSGALSGLLGVHQNLWLEAMQKRLNATLATIGSIKGIKATGRTNTLYE
TILQLRRTEIQKSLKFRELLVALVTLSYLSTTMAPTFAFGTYSILAKIRNMTPLLA
APAFSSLTIMTLLGQAVSGFVESLMGLRQAMASLERIRQYLVGKEAPEPSPNKPGV
ASTEGLVAWSASLDEPGLDPRVEMRRMSSLQHRFYNLGELQD
Structure Prediction
Protein structure prediction is the prediction of the three-
dimensional structure of a protein from its amino acid
sequence i.e, the prediction of its folding and its
secondary, tertiary, and quaternary structure from its
primary structure.
The knowledge of the 3D structure is useful for rational
drug design, protein engineering, detailed study of
protein –bio-molecular interactions, study of
evolutionary relationship between proteins or protein
families etc.
METHOD OF STRUCTURE PREDICTION
Structure prediction
Experimental Method Computational Method
X-Ray NMR EM Template based Template free
Homology Threading Ab inito

We have to build the model of given sequence, 604 AA
residue of Ecdl (Emerucella rugulosa).

Hence, the given protein sequence have not shown the
significant alignment with any solved structure
We cann't perform Homology Modelling to
build the given sequence.

The only alternative way is THREADING or AB
INITIO method.
Threading
“Remote Homology”

Method of protein modeling which is used to model those proteins
which have the same fold as proteins of known structures, but do not
have homologous proteins with known structure.

The software used for fold recognition methods are:
 PHYRE2
 I-TASSER
 MUSTER
 RaptorX
 GenThreader
 LOMETS
Ab inito method
               Predicting the 3D structure without any “prior knowledge”
If structure homologues (occasionally analogues) do not exist, or
exist but cannot be identified, models have to be constructed from
scratch. This procedure, called ab initio modelling.

Software used for Ab inito structure prediction
 Robetta
PHYRE2( Protein Homology/analogY
Recognition Engine V 2.0)

Developed by Dr. Kelly

Released on 14th feb 2011.

Most popular structure prediction server cited over 1500
times.

Ranked as best for function prediction in CASPs 9.

The basic principal of work of PHYRE2 is
 Finding a sequence alignment to a known
structure.
 Copying the co-ordinate and relabeling the
residues according to our sequence based on
alignment.
PHYRE2

Features of PHYRE2:
 Domain analysis
 Highlight motif
 Transmembrane helix are coloured

Algorithm used to predict 3D str is LOCAL ALIGNMENT
&HMM.

Localy aligned our seq against fold library and HMM matching
of our seq and known sequence structure.

Return a confident prediction for a subsequence of our seq cut
this all confident seq and resubunit to join them for their
assembly.
PHYRE2 result
PHYRE2 best model
PHYRE2
ALIGNMENT OF QUERY TO 4f4cA
PHYRE2 BEST MODEL
TRANSMEMBRENE REGION
ANALYSED BY PHYRE2
DALI(Distance mAtrix aLIgnment)

Method for structure-structure alignment.

It uses 3D cartesian coordinate of c-alpha carbon atom of each
protein in order to calculate residue-residue diatance matrix.

Output generate:
 Rank of PDB identifier
 Z-score
 RMSD
 Lali (number of aligned position)
 Nres (number of aligned residue)
 %ID
 PDB discription
DALI result
DALI result analysis

Low rmsd and high nres shows the better alignment.

If both rmsd and nres is high or low, not possible to establish an
order between the alignment.

Rmsd- It is the measure of the average deviation in distance between
aligned alpha carbons (i.e, calculate the divergance from one to
another b/w two sequences)

Z score- The Z-Score is the measure of quality of the structural
alignment.
Note:- DALI package is based on Fartran programming and perl script.
“The shows the best alignment with 4f4c_A with low rmsd 0.6 and
high lali score 403.”
STRUCTURE-STRUCTURE
ALIGNMENT BY PYMOL
Ab inito through ROBETTA

Non query templete based alignment

Robetta secure the best position in CASP (Critical Assessment of
Techniques for Protein Structure Prediction) 4, 5, 6, 7 and 8.

Roberta prediction type-
 1. Ginzu : Domain prediction
 2. Structure : 3D Model (available per domain after
Ginzu completes from result page)
Domain prediction by GINZU protocal

There are several model Robetta produces.

It determine more than one domain that means Robetta breaks up the
query sequence into putative domains and model each of them
separately.

After that assembles all the model into contiguous chain.
RESULT OF ROBETTA
Robetta result analysis

Robetta shows the alignment with these three protein for domain
prediction:
Sl. no. Protein ID Discription
1 4p79 Crystal str of cloudin provides insight into the architecture
of tight junction
Ion channel regulator, alpha helical
Membrane protein
2 1ni0 Hydrolase
Restriction endonuclease PuvII from proteus vulgaris,
class alpha/beta protein
EC 3.1.21.4
3 4m1m Multidrug resistant protein
ATP binding cassate transpoter
Pgp
VALIDATION of MODEL

ANOLEA

PROSA

PROCHECK(PDBsum)
ANOLEA

Atomic Non-LOcal Environment Assessment

Perform energy calculation on a protein chain evaluating non-local
environment of each heavy atomin the molecule.

Steps-

1. Open anolea server

2. Browse sequence file

3. Fill job title n submit to servet .
ANOLEA result
PROSA

PROtein Structure Analysis

Developed by Sippl,1993.

Calculate quality score of C alpha carbon of input structure.

OUTPUT-
 Z score
 Plot of residue score-
 3D structure of input protein
PROSA

PROtein Structure Analysis

Developed by Sippl,1993.

Calculate quality score of C alpha carbon of input structure.

OUTPUT-
 Z score
 Plot of residue score-
 3D structure of input protein
PROSA
1 .Z score- indicate the overall quality of model value display of
all experimentally determined protein chain in PDB.
“more negative value more accurate structure”.
2. Plot of residue score- shows local quality of model by plotting
energy as sum of AA sequence position i (take window size 40)
Positive value correspond problematic or erroneous part of
structure.
3. Prosa web visualized the 3D structure of input protein using the
molecular viewer Jmol.
Residue are colored from blue to red in order of increasing residue
energy.
PROSA RESULT
PROSA RESULT
PROCHECK(PDBsum)

The PDB sum is a pictorial database that provides an at-
a-glance overview of the contents of each 3D structure
deposited in the Protein Data Bank (PDB).

The PROCHECK analyses provide an idea of the stereo-
chemical quality of all protein chains in a given PDB
structure.

They highlight regions of the proteins which appear to
have unusual geometry and provide an overall
assessment of the structure as a whole.

PDBsum uses version 3.6.2 of PROCHECK.
PROCHECK(PDBsum)

The PDB sum is a pictorial database that provides an at-
a-glance overview of the contents of each 3D structure
deposited inthe Protein Data Bank (PDB).

The PROCHECK analyses provide an idea of the
stereochemical quality of all protein chains in a given
PDB structure.

They highlight regions of the proteins which appear to
have unusual geometry and provide an overall
assessment of the structure as a whole.

PDBsumuses version 3.6.2 ofPROCHECK.
PROCHECK
PROCHECK result
PROCHECK result
PROCHECK ANALYSIS
• G factor- The G-factor is a log-odds score based on the observed
distributions of these stereo-chemical parameters.
• A low G-factor indicates that the property corresponds to a low-
probability conformation.
• These are the stereo-chemical property:
1. planarity
2. chirality
3. phi/psi preferences
4. chi angles.
Result and discussion

Fold recognition was done through PHYRE2 server for fold
assessment.

On the other hand ab initio prediction was analyzed by Robetta
sever which gives information about domain.

After build the model, model was validated through some
server ANOLEA & PROSA.

Ramachandran plot of model analysed using PDBsum
PROCHECK with the description of the allowed region.
Result and discussion

The comparative and combined study of phyre2 and Robetta
shows:-
Sl. no.Sl. no. Str. Prediction methodStr. Prediction method Protein idProtein id discription
1 Fold recognition by PHYRE2 4F4C Crystal structureof
themultidrugtransporterP-
glycoprotein from C.
elegans
2 Ab initio by Robetta 4p79 Membrane protein
3 Ab initio by Robetta 1ni0 Hydrolase
4 Ab initio by Robetta 4m1m Multidrug resistant
protein,ATP binding
cassate transpoter,Pgp
Conclusion

The above results of PHYRE2 (fold recognition method) and Robetta
(ab initio prediction) generate the model of given AA sequence which
conclude that the given protein is
 P-glycoprotein: multidrug-resistance and a
superfamily of membrane-associated transport
proteins.
 ABC (ATP binding cassette) transporter
 Transmembrane protein (alpha helical structure)
References

http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index

http://www.sbg.bio.ic.ac.uk/phyre2/phyre2_output/7330b2b464c1ea64/summary.html

http://robetta.bakerlab.org/

http://melolab.org/anolea/

https://prosa.services.came.sbg.ac.at/prosa.php

http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/Generate.html

http://ekhidna.biocenter.helsinki.fi/dali_server/results/20150324-0049-69ef51112579617192cac4dcad7075f2/index.html
modelling assignment

modelling assignment

  • 1.
    Modelling Assignment Submitted to:Submitted by: Dr. Durg Vijay Singh Shweta Kumari Roll- 21 M.Sc. Bioinformatics 2Nd sem
  • 2.
    CONTENT  Objective  Structure prediction  Threading  Ab inito  Phyre2and result  Dali str-str alignment and its result  Robetta and its result  Validation  Result and discussion  Conclusion  reference
  • 3.
    OBJECTIVE To build themodel of given amino acid residue sequence and validate the generated model. >gi|407259499|gb|AFT91383.1| EcdL [Emericella rugulosa] MDDSPWPQCDIRVQDTFGPQVSGCYEDFDFTLLFEESILYLPPLLIAASVALLRIWQL RSTENLLKRSGLLSILKPTSTTRLSNAAIAIGFVASPIFAWLSFWEHARSLRPSTI LNVYLLGTIPMDAARARTLFRMPGNSAIASIFATIVVCKVVLLVVEAMEKQRLLLD RGWAPEETAGILNRSFLWWFNPLLLSGYKQALTVDKLLAVDEDIGVEKSKDEIRRR WAQAVKQNASSLQDVLLAVYRTELWGGFLPRLCLIGVNYAQPFLVNRVVTFLGQPD TSTSRGVASGLIAAYAIVYMGIAVATAAFHHRSYRMVMMVRGGLILLIYDHTLTLN ALSPSKNDSYTLITADIERIVSGLRSLHETWASLIEIALSLWLLETKIRVSAVAAA MVVLVCLLVSGALSGLLGVHQNLWLEAMQKRLNATLATIGSIKGIKATGRTNTLYE TILQLRRTEIQKSLKFRELLVALVTLSYLSTTMAPTFAFGTYSILAKIRNMTPLLA APAFSSLTIMTLLGQAVSGFVESLMGLRQAMASLERIRQYLVGKEAPEPSPNKPGV ASTEGLVAWSASLDEPGLDPRVEMRRMSSLQHRFYNLGELQD
  • 4.
    Structure Prediction Protein structureprediction is the prediction of the three- dimensional structure of a protein from its amino acid sequence i.e, the prediction of its folding and its secondary, tertiary, and quaternary structure from its primary structure. The knowledge of the 3D structure is useful for rational drug design, protein engineering, detailed study of protein –bio-molecular interactions, study of evolutionary relationship between proteins or protein families etc.
  • 5.
    METHOD OF STRUCTUREPREDICTION Structure prediction Experimental Method Computational Method X-Ray NMR EM Template based Template free Homology Threading Ab inito
  • 6.
     We have tobuild the model of given sequence, 604 AA residue of Ecdl (Emerucella rugulosa).  Hence, the given protein sequence have not shown the significant alignment with any solved structure We cann't perform Homology Modelling to build the given sequence.  The only alternative way is THREADING or AB INITIO method.
  • 8.
    Threading “Remote Homology”  Method ofprotein modeling which is used to model those proteins which have the same fold as proteins of known structures, but do not have homologous proteins with known structure.  The software used for fold recognition methods are:  PHYRE2  I-TASSER  MUSTER  RaptorX  GenThreader  LOMETS
  • 9.
    Ab inito method                Predicting the 3D structure without any “prior knowledge” Ifstructure homologues (occasionally analogues) do not exist, or exist but cannot be identified, models have to be constructed from scratch. This procedure, called ab initio modelling.  Software used for Ab inito structure prediction  Robetta
  • 10.
    PHYRE2( Protein Homology/analogY RecognitionEngine V 2.0)  Developed by Dr. Kelly  Released on 14th feb 2011.  Most popular structure prediction server cited over 1500 times.  Ranked as best for function prediction in CASPs 9.  The basic principal of work of PHYRE2 is  Finding a sequence alignment to a known structure.  Copying the co-ordinate and relabeling the residues according to our sequence based on alignment.
  • 11.
    PHYRE2  Features of PHYRE2: Domain analysis  Highlight motif  Transmembrane helix are coloured  Algorithm used to predict 3D str is LOCAL ALIGNMENT &HMM.  Localy aligned our seq against fold library and HMM matching of our seq and known sequence structure.  Return a confident prediction for a subsequence of our seq cut this all confident seq and resubunit to join them for their assembly.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
    DALI(Distance mAtrix aLIgnment)  Methodfor structure-structure alignment.  It uses 3D cartesian coordinate of c-alpha carbon atom of each protein in order to calculate residue-residue diatance matrix.  Output generate:  Rank of PDB identifier  Z-score  RMSD  Lali (number of aligned position)  Nres (number of aligned residue)  %ID  PDB discription
  • 19.
  • 20.
    DALI result analysis  Lowrmsd and high nres shows the better alignment.  If both rmsd and nres is high or low, not possible to establish an order between the alignment.  Rmsd- It is the measure of the average deviation in distance between aligned alpha carbons (i.e, calculate the divergance from one to another b/w two sequences)  Z score- The Z-Score is the measure of quality of the structural alignment. Note:- DALI package is based on Fartran programming and perl script. “The shows the best alignment with 4f4c_A with low rmsd 0.6 and high lali score 403.”
  • 21.
  • 22.
    Ab inito throughROBETTA  Non query templete based alignment  Robetta secure the best position in CASP (Critical Assessment of Techniques for Protein Structure Prediction) 4, 5, 6, 7 and 8.  Roberta prediction type-  1. Ginzu : Domain prediction  2. Structure : 3D Model (available per domain after Ginzu completes from result page)
  • 23.
    Domain prediction byGINZU protocal  There are several model Robetta produces.  It determine more than one domain that means Robetta breaks up the query sequence into putative domains and model each of them separately.  After that assembles all the model into contiguous chain.
  • 24.
  • 25.
    Robetta result analysis  Robettashows the alignment with these three protein for domain prediction: Sl. no. Protein ID Discription 1 4p79 Crystal str of cloudin provides insight into the architecture of tight junction Ion channel regulator, alpha helical Membrane protein 2 1ni0 Hydrolase Restriction endonuclease PuvII from proteus vulgaris, class alpha/beta protein EC 3.1.21.4 3 4m1m Multidrug resistant protein ATP binding cassate transpoter Pgp
  • 26.
  • 27.
    ANOLEA  Atomic Non-LOcal EnvironmentAssessment  Perform energy calculation on a protein chain evaluating non-local environment of each heavy atomin the molecule.  Steps-  1. Open anolea server  2. Browse sequence file  3. Fill job title n submit to servet .
  • 28.
  • 29.
    PROSA  PROtein Structure Analysis  Developedby Sippl,1993.  Calculate quality score of C alpha carbon of input structure.  OUTPUT-  Z score  Plot of residue score-  3D structure of input protein
  • 30.
    PROSA  PROtein Structure Analysis  Developedby Sippl,1993.  Calculate quality score of C alpha carbon of input structure.  OUTPUT-  Z score  Plot of residue score-  3D structure of input protein
  • 31.
    PROSA 1 .Z score-indicate the overall quality of model value display of all experimentally determined protein chain in PDB. “more negative value more accurate structure”. 2. Plot of residue score- shows local quality of model by plotting energy as sum of AA sequence position i (take window size 40) Positive value correspond problematic or erroneous part of structure. 3. Prosa web visualized the 3D structure of input protein using the molecular viewer Jmol. Residue are colored from blue to red in order of increasing residue energy.
  • 32.
  • 33.
  • 34.
    PROCHECK(PDBsum)  The PDB sumis a pictorial database that provides an at- a-glance overview of the contents of each 3D structure deposited in the Protein Data Bank (PDB).  The PROCHECK analyses provide an idea of the stereo- chemical quality of all protein chains in a given PDB structure.  They highlight regions of the proteins which appear to have unusual geometry and provide an overall assessment of the structure as a whole.  PDBsum uses version 3.6.2 of PROCHECK.
  • 35.
    PROCHECK(PDBsum)  The PDB sumis a pictorial database that provides an at- a-glance overview of the contents of each 3D structure deposited inthe Protein Data Bank (PDB).  The PROCHECK analyses provide an idea of the stereochemical quality of all protein chains in a given PDB structure.  They highlight regions of the proteins which appear to have unusual geometry and provide an overall assessment of the structure as a whole.  PDBsumuses version 3.6.2 ofPROCHECK.
  • 36.
  • 37.
  • 38.
  • 39.
    PROCHECK ANALYSIS • Gfactor- The G-factor is a log-odds score based on the observed distributions of these stereo-chemical parameters. • A low G-factor indicates that the property corresponds to a low- probability conformation. • These are the stereo-chemical property: 1. planarity 2. chirality 3. phi/psi preferences 4. chi angles.
  • 40.
    Result and discussion  Foldrecognition was done through PHYRE2 server for fold assessment.  On the other hand ab initio prediction was analyzed by Robetta sever which gives information about domain.  After build the model, model was validated through some server ANOLEA & PROSA.  Ramachandran plot of model analysed using PDBsum PROCHECK with the description of the allowed region.
  • 41.
    Result and discussion  Thecomparative and combined study of phyre2 and Robetta shows:- Sl. no.Sl. no. Str. Prediction methodStr. Prediction method Protein idProtein id discription 1 Fold recognition by PHYRE2 4F4C Crystal structureof themultidrugtransporterP- glycoprotein from C. elegans 2 Ab initio by Robetta 4p79 Membrane protein 3 Ab initio by Robetta 1ni0 Hydrolase 4 Ab initio by Robetta 4m1m Multidrug resistant protein,ATP binding cassate transpoter,Pgp
  • 42.
    Conclusion  The above resultsof PHYRE2 (fold recognition method) and Robetta (ab initio prediction) generate the model of given AA sequence which conclude that the given protein is  P-glycoprotein: multidrug-resistance and a superfamily of membrane-associated transport proteins.  ABC (ATP binding cassette) transporter  Transmembrane protein (alpha helical structure)
  • 43.