2. Background
• Structure modeling processes often involve human interventions because the
human-expert knowledge combined with biochemical information (function,
mutagenesis, catalytic residues, etc.) could help in both structural assembly
and model selection.
• Development of fully automated algorithms allows non-experts to generate
structural models for their own sequences through Internet services.
• I-TASSER (as 'Zhang-Server') was ranked as the no. 1 server in recent
CASP7 and CASP8 experiments.
3. THE ZHANG LAB ON-LINE SERVICE SYSTEM CONTAINS:
• ON-LINE SERVERS - [FOLDING, DOCKING, DESIGN, DOMAINS ETC; SOME ARE
DOWNLOADABLE]
• BIOINFORMATICS TOOLS - [ALIGNMENT, IMAGE, CLUSTERING ETC; ALL ARE
DOWNLOADABLE]
• DATABASES - [LIGAND, GPCR, GENOME, DECOY, POTENTIAL, CASP ETC; ALL ARE
DOWNLOADABLE]
4. I. Protein Structure and Function Prediction Services (folding, threading, potential,
contact, torsion, docking etc.)
II. Bioinformatics Tools (structure alignment, sequence alignment,
3D visualization, surface, and clustering, etc.)
III. Databases and Potentials
5. Introduction:
• I-TASSER server is an Internet service for protein structure and function
predictions.
• It allows academic users to automatically generate high-quality predictions
of 3D structure and biological function of protein molecules from their amino
acid sequences.
• Models are built based on multiple-threading alignments by LOMETS and
iterative TASSER simulations.
• I-TASSER (as 'Zhang-Server') was ranked as the No 1 server in recent
CASP7 and CASP8 experiments.
6. I-TASSER method
• I-TASSER is a hierarchical protein structure modeling approach.
• It is based on the Profile-Profile threading Alignment (PPA) and the iterative
implementation of the Threading ASSEmbly Refinement (TASSER) program.
• The target sequences are first threaded through a representative PDB
structure library (with a pair-wise sequence identity cut-off of 70%) to search
for the possible folds.
• It is done by four simple variants of PPA methods, with different
combinations of the hidden Markov model, PSI-BLAST profiles, Needleman-
Wunsch and Smith-Waterman alignment algorithms.
7. • Threading aligned regions are used to reassemble full-length models while
the threading unaligned regions (mainly loops) are built by ab initio
modeling.
• conformational space is searched by replica-exchange Monte Carlo
simulations.
• Structure trajectories are clustered by SPICKER and the cluster centroids
are obtained by the averaging the coordinates of all clustered structures.
• Fragment assembly simulation to rule out the steric clashes on the centroid
structures and to refine the models further.
8. • Spatial restraints are extracted from the centroids and the PDB structures
searched by the structure alignment program TM-align.
• Finally, the structure decoys are clustered and the lowest energy structure
in each cluster is selected, which has the Cα atoms and the side-chain
centers of mass specified.
• Pulchra is used to add backbone atoms (N, C, O) and Scwrl_3.0 to build
side-chain rotamers.
9. How does I-TASSER generate structure and function predictions?
submission of an amino acid sequence
server first tries to retrieve template proteins of similar folds (or super-secondary
structures) from the PDB library by LOMETS, (a locally installed meta-threading
approach).
if fragments found
then are reassembled into full-length
models by replica-exchange Monte Carlo
simulations by threading unaligned regions
(mainly loops) through ab initio modeling.
If not
I-TASSER will build the whole structures
by ab initio modeling. The low free-energy
states are identified by SPICKER through
clustering the simulation decoys.
10. • Third step, the fragment assembly simulation is performed again starting from
the SPICKER cluster centroids, where the spatial restrains collected from
both the LOMETS templates and the PDB structures by TM-align are used to
guide the simulations.
• The purpose of the second iteration is to remove the steric clash as well as to
refine the global topology of the cluster centroids.
• The decoys generated in the second simulations are then clustered and the
lowest energy structures are selected.
• The final full-atomic models are obtained by REMO which builds the atomic
details from the selected I-TASSER decoys through the optimization of the
hydrogen-bonding network.
12. • If any region with >80 residues has no aligned residues in at least two strong PPA
alignments of Z-score > Z0 , the target will be judged as a multiple domain protein and
domain boundaries are automatically assigned based on the borders of the large gaps.
• I-TASSER simulations will be run for the full chain as well as the separate domains. The
final full-length models are generated by docking the model of domains together.
• The domain docking is performed by a quick Metropolis Monte Carlo simulation where the
energy is defined as the RMSD of domain models to the full-chain model plus the
reciprocal of the number of steric clashes between domains.
• The goal of the docking is to find the domain orientation that is closest to the I-TASSER
full-chain model but has the minimum steric clashes.
• This procedure does not influence the multiple domain proteins which have all domains
completely aligned by the PPAs.
13. Server setting
• Project name: I-TASSER server
• Home page: http://zhanglab.ccmb.med.umich.edu/I-TASSER/
• I-TASSER Standalone Package (Version 4.0)
Operating system(s): Windows, Linux, Mac
Programming language: Perl, Fortran77
License: GPL
o Input: Amino acid sequence of the proteins(10–1,500 residues in FASTA format)
o Output: Email sent to the users, include the PDB format files of up to 5 predicted
models, C-score of the models, and the predicted RMSD and Tm-score of the first
model. A brief explanation of the RMSD, TM-score, and C-score
15. Running:
• Log on to I-Tasser web page.
• Copy & paste or directlt upload the a.a. sequence int the provided box also
provide e-mail address and name of the job.
• There is also option for specifying the inter residue contact or distance
restraints to exclude some tempelate.
• To submit click on “Run I-Tasser”
• Check the status of submitted job by visiting the I-Tasser queue page.
• Click on search to find out the submitted job by providing the job-Id no.
• After the structure and modelling is finished a notification e-mail containing
image of the predicted structure and web link will be sent.
• Click to view and download the result.
16. Structure analysis
1. Predicted secondary structure
• Displayed H as -helix, S for β-strand and C for coil.
• Also consider the confidence score for each residue.
• Look for region for long stretches of secondary structure to estimate
the core region in the protein.
17. 2. Predicted Solvent Accessibility
• To predict the buried and exposed region in the query.
• It is “0” for buried residue and “9” for residue.
• The region with solvent exposed and hydrophilic residue are
potential hydration or functional site.
18. 3. Top 5 Models predicted by I-TASSER
• There is predicted tertiary structure of protein in the interactive j-mol
appellate.
• Left click to change the appearance of displayed structure (style,
zoom, select etc.)
• C-score
To analyze the quality of the predict.
its range is -5 to 2.
Higher score reflect a model of better quality.
19. • TM score and RMSD
standards for measuring structural similarity between query and
tempelate
used to measure the accuracy of structure modeling when the native
structure is known
TM-score >0.5 indicates a model of correct topology and a TM-
score<0.17 means a random similarity
4. Top 10 templates used by I-TASSER
• Analyze sequence identity in the threading aligned region and for the
whole chain to access the homology between query and template.
• Z-score
Used to analyze the quality of threading alignment
Z-score>1 reflect confident alignment and most likely to have the
same fold as query protein
20. • High sequence identity 1 in the threading aligned region as compared to
whole chain to access the homology between query and template
• High sequence identity 2 represent the evolutionary relationship between
the query and template.
• Colored residue shows the conserved residue or motif in the query protein
and template protein.
• High sequence identity in the threading aligned region as compared to
whole chain alignment indicate presence of conserved motif or domain in
the sequence.
• Access the coverage of the alignment by inspecting the alignment.
• If the coverage of top alignment is low and confined to a small region of a
query protein shows absence of long segment of query segment, which
indicate query protein contain more than one domain.
• In this case it is suggested to split the sequence and model the domain
individually.
21.
22. 5. Structural analog in PDB and Enzyme commission number prediction
• To determine the top 10 structural analog of the first predicted models as identified by the
alignment program TM-align
• TM-score>0.5 indicate the detected homology and the model have a similar topology and
used to determine structural class of protein family of query sequence
• TM-score<0.3 signify random structure.
• Analyze the sequence identity and RMSD to access the conservational special motif of the
model and structural analog for that (see colored portion
23.
24. 6. Function prediction using COFACTOR
• EC number
Gives the potential homologue of the query protein
Confidence level is shown as EC score
CscoreEC is the confidence score for the Enzyme Classification (EC) number
prediction
CscoreEC values range in between [0-1]
A higher score indicates a more reliable EC number prediction.
RMSDa is the RMSD between residues that are structurally aligned by TM-align
while IDENa is the percentage sequence identity in the structurally aligned region.
Cov. represents the coverage of global structural alignment and is equal to the
number of structurally aligned residues divided by length of the query protein.
If the EC score is very high then there is lack of consensus against identified hits,
the prediction becomes less reliable then we go for gene ontology prediction.
25.
26. • Gene ontology (GO) term and protein- ligand binding site prediction
CscoreGO, which is a combined measure for evaluating global and local
similarity between query and template protein.
CscoreGO values range in between [0-1]
A higher value indicates a better confidence in predicting the function
using the template.
Each protein is associated with multiple GO term describing its molecular
function, biological process and cellular location (each term is linked to
respective amigo website and lineage)
Analyze the Fh score(functional homology score) column to access the
functional homology between query and template
GO score >0.5 indicates a reliable prediction.
27.
28. 7. Template proteins with similar binding site
• They are ranked based on the no. of predicted ligand conformation.
• They share a common binding pocket. The best indentified ligand are already
displayed in the J-mol applet.
• Click on the other radio buttons to visualize predicted binding site and ligand
interacting residues.
• CscoreLB
is the confidence score of predicted binding site
Its values range in between [0-1]
A higher score indicates a more reliable ligand-binding site prediction.
• BS-score
measure of local similarity (sequence & structure) between template
binding site and predicted binding site in the query structure.
BS-score >1 reflects a significant local match(structural similarity) between
the predicted and template binding site.
29.
30. Advantage
• Main advantage over the existing structural modelling method is the
inherent structure fragment assembly approach which consistently drive
the threading alignment close to the native state.