1. 1
CENTRAL UNIVERTY OF BIHAR
BIS 553: protein modelling and simulation
ROBETTA DE NOVO STRUCTURE PREDICTION
(lec-1)
ROBETTA as de novo Structure Prediction
Submitted to:- Submitted by:-
Dr. Durg Vijay Singh Swati Kumari
Roll no- 22
2nd semester
Central University of
Bihar,Patna
2. 2
CONTENT :-
Sl. No. Index Page no.
1. Introduction
2. Performance of Robetta
in CASP
3. Development and
history
4. Aims of Robetta
5. Robetta as multiple
functional modules
6. Steps of Robetta
7. Domain prediction
8. What is Ginzu Protocol
9. High resolution
structure prediction
10. Component of Low
Resolution Scoring
Function
11. Component of Low
Resolution Scoring
Function
12. Fragment-based
Methods (Rosetta)
13. Limitations using
Robetta
14. Reference
3. 3
Introduction :-
“Protein structure prediction and analysis using the Robetta server”
Rosetta is a well-established computational software suite with a variety of tools
developed for macromolecular modeling, structure prediction and functional design.
It uses a massive distributed computing infrastructure (Rosetta@home)
The Robetta server (http://robetta.bakerlab.org) provides automated tools for protein
structure prediction and analysis.
For structure prediction, sequences submitted to the server are parsed into putative
domains and structural models are generated using either comparative modeling or de
novo structure prediction methods.
If a confident match to a protein of known structure is found using BLAST, PSI-BLAST,
FFAS03 or 3D-Jury, it is used as a template for comparative modeling. If no match is
found, structure predictions are made using the de novo Rosetta fragment insertion
method.
Experimental nuclear magnetic resonance (NMR) constraints data can also be submitted
with a query sequence for RosettaNMR de novo structure determination. Other current
capabilities include the prediction of the effects of mutations on protein–protein
interactions using computational interface alanine scanning.
The Rosetta method was originally developed for de novo protein structure prediction
and is regularly one of the best performers in the community-wide Critical Assessment of
Structure Prediction (CASP).
Robetta provides both ab initio and comparative models of protein domains.
Domains without a detectable PDB homolog are modeled with the Rosetta de novo
protocol.
Comparative models are built from template PDB’s detected and aligned locally installed
version of HHSEARCH/HHpred , Raptorx and sparks-x.
Alignment are clustered and comparative models are generated using Rosetta CM
Protocol
Procedure is fully automated.
Robetta continually evaluated through CAMEO (server11).
4. 4
Robetta is evaluated in blind benchmarking of CASP.
Robetta uses ROSETTA software which is developed and maintained by Rosetta
commons
In addition to the de novostructure prediction, Rosetta also has methods for :-
1.protein-protein and protein-small molecule docking,
2.homology modeling,
3.novel protein design,
4.redesign of existing proteins for altered function.
In principle, Rosetta implements mostly knowledge-guided Metropolis Monte Carlo
sampling approaches coupled with knowledge-guided energy functions to perform two
tasks: sampling the conformational space and evaluating the energy of the resulting
structural models.
The well-validated energy function and sampling methodologies used in Rosetta form the
foundation for the high quality prediction and design of macromolecular structures and
interactions, with successful stories ranging from fibril structure prediction to RNA
folding to the design of new enzyme catalysts.
With citations in hundreds of research publications, Rosetta is a trusted resource for many
top research teams in pharmaceutical companies and non-profit institutions.
http://rosetta.insilicos.com/what/
http://en.wikipedia.org/wiki/Rosetta@home
http://nar.oxfordjournals.org/content/32/suppl_2/W526.full
Performance of Robetta in Critical Assessment of
Techniques for Protein Structure Prediction (CASP):-
Result from the fourth and fift critical assessments of structure prediction
(CASP4,CASP5,CASP6) have shown that Robetta is currently one of the best method
for de novo protein structure prediction and distant fold recognition.
Robetta has participated as an automated prediction server in the biannual CASP
experiments since CASP5 in 2002, performing among the best in the automated server
prediction category.
5. 5
Robetta has since competed in CASP6 and 7, where it did better than average among both
automated server and human predictor groups.
In modeling protein structure as of CASP6, Robetta first searches for structural homologs
using BLAST, PSI-BLAST, and 3D-Jury, then parses the target sequence into its
individual domains, or independently folding units of proteins, by matching the sequence
to structural families in the Pfam database. Domains with structural homologs then
follow a "template-based model" (i.e., homology modeling) protocol.
In CASP8, Robetta was augmented to use Rosetta's high resolution all-atom refinement
method,the absence of which was cited as the main cause for Robetta being less accurate
than the Rosetta@home network in CASP7.
http://en.wikipedia.org/wiki/Rosetta@home
Development and history :-
Originally introduced by the Baker laboratory at the University of Washington in 1998
as an ab initio approach to structure prediction, Rosetta has since branched into several
development streams and distinct services.
The Rosetta platform derives its name from the Rosetta Stone, as it attempts to decipher
the structural "meaning" of proteins' amino acid sequences.
More than seven years after Rosetta's first appearance, the Rosetta@home project
was released (i.e. announced as no longer beta) on October 6, 2005.
Many of the graduate students and other researchers involved in Rosetta's initial
development have since moved to other universities and research institutions, and
subsequently enhanced different parts of the Rosetta project.
http://en.wikipedia.org/wiki/Rosetta@home
Aim of Robetta :-
Rosetta@home aims to predict protein–protein docking and design new proteins with
the help of about sixty thousand active volunteered computers processing at 83
teraFLOPS on average as of April 18, 2014.
Foldit, a Rosetta@Home videogame, aims to reach these goals with a
crowdsourcing approach. Though much of the project is oriented towards basic
research on improving the accuracy and robustness of the proteomics methods,
6. 6
Rosetta@home also does applied research on malaria , Alzheimer's disease and other
pathologies.
In addition to disease-related research, the Rosetta@home network serves as a
testing framework for new methods in structural bioinformatics.
These new methods are then used in other Rosetta-based applications, like
RosettaDock and the Human Proteome Folding Project, after being sufficiently
developed and proven stable on Rosetta@home's large and diverse collection of
volunteer computers.
http://en.wikipedia.org/wiki/Rosetta@home
Robetta as multiple functional modules :-
Rosetta’s macromolecular modeling capabilities empower researchers to address a wide
variety of questions in structural biology. The software contains multiple functional
modules with some representative features as follows.
Application Name Description
Structure
prediction
AbinitioRelax Predict high resolution 3-D
structure of a protein from its
amino acid sequence
Comparative_modeling Build structural models of
proteins using one or more
known structures as templates
Rna_denovo De novo tertiary structure
prediction and design of
complex RNAs with high
resolution
Design
FixedBBProteinDesign Redesign the amino acids for
target protein backbones
Enzdes Design a protein active site to
catalyze a chemical reaction
AnchoredDesign Design interfaces between
known target structures and
new binding partners
Protein_docking Predict and refine the docked
conformation of two proteins
with a known structure
7. 7
Dock
Ligand_docking Predict the orientation that a
small molecule binds to a
protein target and calculate
binding energy
AntibodyModeler Predict antibody Fv region
structures and perform
antibody-antigen docking
http://rosetta.insilicos.com/what/
Steps of Robetta :-
Robetta works as two phase-
1. Initialy search for conformational land scape i..e, low resolution approach.
2. High resolution approach whrere atomic detail and physically derived energy
functions are imployed.
Low resolution Prediction (Predict structure about near accuracy):-
In low resolution phase overall topology is searched using a statistical scoring
function and fragment assembly.
An atomic detail refignment phase using rotamers and small backbaone angle
moves
and more physically relevant scoring function.
Robetta used information from the PDB to estimate the possible conformation for
local sequence segments.
It first generate libraries of local sequence fragments exsized from non-reduntant
(curated) version of the PDB on the basis of local sequence similarity (3-9 residues
matches between the querry sequence and given structure in the PDB).
The selection of Fragments of local structure on the basis of match of local
sequence dramatically reduces the size of associable conformational land scape.
8. 8
Tertiary stucture generated using monte carlo search of the possible combination of
likely local structure minimizing a scoring function that account for local structure
information such as compactness, hydrophobic buried, specific pair interaction &
strand pairing (beta strand).
High resolution Prediction (Predict structure about accurate
accuracy):-
For the Second stage refinement the centroid representation of Amino Acid side
chains used in the low resolution phase are replaced with atomic detailed rotamer
representation.
In this phase scoring function include salvation term, H-bond term, and other term
with direct physical interpretation.
Fig :- steps of robetta
source – Structural Bioinformatics, 2nd
edition, edited by Jenny Gu, Philip E. Bourne.
9. 9
Refinement Phase :-
The most natural starting point for simulating high resolution protein folding is
Standard Molecular Dynamics Simulation (numerically integrated newton's
equation of motion for polypeptide chain) using physically regionable potential
function.
Domain prediction:-
Domain prediction is a critical pre resquisite to the structur prediction “As the size
of the protein increases, its conformational space also increases.”
Current denovo methods are limited to protein domain of 150 amino acid domain
residue for alpha-beta protein.
80 residue for beta folds and 150 for alpha fold only.
To overcome this two approaches can apply-
1. Increase the size range of denovo structure prediction.
2. Dividing protein into domains prior to attempting two protein structure
prediction.
"A domain is generally define as a portion of protein that folds
independently of the rest of the protein."
So dividing a query sequence into their smallest component domain prior to
folding is straight forward way to increase the size of the predictio.
For many proteins domains division can be easily found while several domain
remains beyond our ability to correctly detached them.
The determination of domain, family membership and its boundries for
multidomain protein is a vital step in structure annotation/ prediction.
In brief, most domain protein partial methods relay on hierarchy searching for
domains in query sequence with collection of primary sequence methods, domains
library search and matches to structural domains in the PDB.
10. 10
What is Ginzu Protocol :-
Ginzu is a PDB template identification and domain prediction protocol that
attempts to determine the regions of a protein chain that are aligned to PDB
templates with reasonable confidence, and in regions where templates are not
detected, it attempts to find regions that will fold into globular units, called
"domains".
Referance – http://robetta.bakerlab.org/faqs.jsp#removingjobs
source – Structural Bioinformatics, 2nd
edition, edited by Jenny Gu, Philip E. Bourne.
Component of Low Resolution Scoring Function :-
1. Residue Environment (salvation)
2. Residue Pair Interaction (electrostatics and disulfides)
3. Steric Repulsion
4. Radition of Gyration (compactness measure - Van der Waals Interaction and
Solvation)
5. C-B density (Salvation, Correction for exclude value effect introduced by
Simulation)
11. 11
6. Strand Pairing (H-bonding)
7. Strand Arrangement into Sheets
8. Helix Strand Packing
Component of High Resolution Scoring Function :-
1. Ramachandran Torsion Preferences
2. Lennard Jones Interaction
3. H-bonding
4. Salvation
5. Residue Pair Interaction (electrostatic interaction, disulphide bond)
6. Rotamer Self Energy
7. Unfolded State Preferences
Fragment-based Methods (Rosetta) :-
Hypothesis, the PDB database contains all the possible conformations that a short region
of a protein chain might adopt.
How do we choose fragments that are most likely to correctly represent the query
sequence?
12. 12
Limitation using Robetta :-
There are many limitations to consider when using Robetta, as with all structure
prediction methods. The de novo protocol is optimized for small single domain proteins
(<120 residues). Within this limit, models are frequently around 3–7 Å RMSD to more
than half of the native structure. Above this limit, models are still likely to have at least
50 residues within 4 Å RMSD, as shown in Table 1. For comparative modeling, the
quality of the model is greatly dependent on the correct selection of the best possible
parent template and alignment. Because of these factors, results are highly dependent on
the accuracy of the domain assignments. A general rule to follow is that BLAST, PSI-
BLAST, FFAS03 and 3D-Jury parent detections should be considered the most reliable,
in that order. Domains predicted from Pfam-A and the MSA should be treated with
caution, particularly for longer domains and also those that were assigned solely by the
MSA.
http://nar.oxfordjournals.org/content/32/suppl_2/W526.full
Reference:
1. http://rosetta.insilicos.com/what/
2. http://en.wikipedia.org/wiki/Rosetta@home
3. http://nar.oxfordjournals.org/content/32/suppl_2/W526.full
4. http://en.wikipedia.org/wiki/Rosetta@home
5. http://rosetta.insilicos.com/what/
6. Structural Bioinformatics, 2nd
edition, edited by Jenny Gu, Philip E. Bourne.
7. http://robetta.bakerlab.org/faqs.jsp#removingjobs
8. http://nar.oxfordjournals.org/content/32/suppl_2/W526.full