Hyung-Rae Kim, Amit Roy,
Daisuke Kihara
http://kiharalab.org
333
Overall Prediction Procedure
Server
Models
PRESCO
Residue
Environment
Score
BLOSUM30
CC80
CCPC
QUIB
QUC2
Ranking by
AA matrices
Structure
Refinement by
CHARMM runs
Side-Chain
Modeling
5 models
Final 5
models
20-30 models
CABS 5 Clusters
Fragment-
interaction
-potential
Add H
Starting
Structure
Minimize in
Screened Coulomb
Potential (SCP)
10 x 10ns → 5000 snapshots
constraints on
secondary structure
MD 20 x 10ns
with SCP
CH27 force field
10 x 10ns → 5000 snapshots
constraints on
all Ca atoms
Dfire score,
RMSD with
initial structure
Corr (dfire,Irmsd) > 0.4
Discarded (very rare)
Average structure
from low dfire, Irmsd
snapshots from
set 1 and 2
Relax at low T MD
Model 1 and 2
Select structures with
Lowest dfire score
Model 3,4,5
Hassan, S. A., Guarnieri, F., & Mehler, E. L. (2000).
The Journal of Physical Chemistry B, 104(27), 6478-6489.
Mirjalili, V., Noyes, K., & Feig, M. (2014).
Proteins: Structure, Function, and Bioinformatics, 82(S2), 196-207.
Refinement Procedure
Side-Chain Depth Environment (SDE)
within a sphere of 6 or 8 Å
along the main-chain
Center
(Kim & Kihara, Proteins 2014)
Finding Similar SDE from Database
Structure
Database
2536 proteins
500 lowest
RMSD
fragments of 9
side-chain
centroids;
Superimposed
with the query
fragment
Select SDE
with the same
number of
side-chain
centroids in
the sphere of
8.0/6.0Å
Query SDE
Compute
residue-depth
RMSD for
corresponding
side-chain
centroids
Sort by depth RMSD
to the query
surface
(Kim & Kihara, Proteins, 2014)
Decoy Evaluation
with Protein
Residue
Environmental
Score (PRESCO)
CCPC, CC80 Matrices:
Contact definition of two residues: any pair of side-
chain heavy atoms or Cα atom less than 4.5 Å
Compute a knowledge-based residue contact
potential (Gaussian chain reference state,
composition correction averaging)
Correlation coefficients of residue pairs are used
as values of the amino acid similarity matrix
Residue Contact Potential-Based Matrix
(Tan, Huang, & Kihara, Proteins, 2006)
Structure-derived Amino Acid
Similarity Matrices in AAIndex
BLAJ010101 - Structural superposition data for identifying potential remote homologues
(Blake-Cohen, 2001)
HENS920101 - BLOSUM45 substitution matrix (Henikoff-Henikoff, 1992)
JOHM930101 - Structure-based amino acid scoring table (Johnson-Overington, 1993)
KOLA920101 - Conformational similarity weight matrix (Kolaskar-Kulkarni-Kale, 1992)
KOSJ950115 - Context-dependent optimal substitution matrices for all residues
(Koshi-Goldstein, 1995)
MIYS930101 - Base-substitution-protein-stability matrix (Miyazawa-Jernigan, 1993)
OVEJ920101 - STR matrix from structure-based alignments (Overington et al., 1992)
PRLA000101 - Structure derived matrix (SDM) for alignment of distantly related sequences
(Prlic et al., 2000)
PRLA000102 - Homologous structure derived matrix for alignment of distantly related sequences
(Prlic et al., 2000)
QU_C930101 - Cross-correlation coefficients of preference factors main chain (Qu et al., 1993)
QU_C930102 - Cross-correlation coefficients of preference factors side chain (Qu et al., 1993)
QUIB020101 - STROMA score matrix for the alignment of known distant homologs
(Qian-Goldstein, 2002)
Alignment Accuracy by AA Matrices
2761 Fold level protein sequence pairs, Lindahl & Eloffson Database
(Tan, Huang, Kihara, Proteins 2006)
Correct alignments: >50% of residues are correctly aligned
Native Structure Recognition
10
Decoy Sets DFIRE dDFIRE DOPE RW RWplus OPUS-
PSP
GOAP MRE
(CC80)
SDE
(QUIB)
Combinations of
MRE & SDE
#
Targe
tsBLSM3
0+QU_
C2
BLSM3
0+QU_
C2
CC80+
QU_C1
4state_redu
ced
6 7 7 6 6 7 7 7 7 7 7 7 7
Fisa 3 3 3 3 3 3 3 2 2 2 2 3 4
Fisa_casp3 4 4 3 4 4 5 5 2 1 3 3 4 5
Lmds 7 6 7 7 7 8 7 10 6 10 10 10 10
Lattice_ssfit 8 8 8 8 8 8 8 8 8 8 8 8 8
hg_structal 12 16 ---- ---- 12 18 22 28 11 27 27 27 29
ig_structal 0 26 ---- ---- 0 20 47 61 6 61 61 61 61
ig_structal_
hires
0 16 ---- ---- 0 14 18 20 6 20 20 20 20
Moulder 19 18 19 19 19 19 19 20 16 20 20 20 20
ROSETTA 20 12 21 20 20 39 45 25 31 41 41 39 58
I-TASSER 49 48 30 53 56 55 45 56 47 56 56 56 56
#Total
(Z-score)
128
(-1.94)
164
(-2.52)
98/168
(-2.47)
120/168
(-3.23)
135
(-2.13)
196
(-2.86)
226
(-3.57)
239
(6.78)
141
(2.14)
255
(5.70)
255
(5.76)
255
(5.65)
278
Scoring Function Models only Native included
Average
Rank
Ranked 1 Average
Rank
ranked 1
MRE (CC80) 6.77 29 1.32 131
SDE (QUIB) 2.89 56 1.98 97
Combinations of
MRE & SDE
BLSM30+QU_C1 6.79 31 1.18 139
CC80(SDE)+BLSM
30(SDE)
2.82 66 1.99 89
QMEAN6 2.87 85 1.71 113
RWplus 2.97 57 1.78 106
RW 3.08 51 1.71 110
QMEANall_atom 3.59 74 1.71 119
QMEANSSE_agree 3.74 62 3.72 39
RF_HA_SRS 4.65 49 1.38 137
OPUS_CA 4.72 79 5.13 55
RF_HA 5.44 62 2.78 112
DOPE 5.77 54 3.27 95
DFIRE 6.03 50 5.69 33
Floudas-CM 7.75 38 7.05 42
Melo-ANOLEA 9.62 19 5.19 86
Random 9.72 13.9 10.1 8.3
Benchamark
on Ryukumov
& Fiser CASP
Set
Comparison
against36
scoring
functions.
Only showing
results of 13
functions.
Best Second best Third best
Side-Chain Building
(Peterson, Kang, Kihara, Proteins 2014)
T0804 Top 1 models
Kiharalab: TS333_1
Boniecki_pred: TS301_1
Skwark: TS358_1
T0804 Kiharalab Top 1 Model
Native (Coordinates not available)
Kiharalab_1
GDT-TS: 31.44 GOAP: -18178.22
QUARK_5
GDT-TS: 30.93 GOAP: -14959.68
Best in Top 1 Models
T0804 Server Models Selected by
PRESCO
Final Selection
Rank Model GDT-TS
1 QUARK_TS5 30.93
2 myprotein-me_TS4 12.63
3 Zhang-server_TS5 29.77
4 Seok-server_TS2 11.86
5 BAKER_ROSETTASERVER_TS3 12.37
QU_C2 + BLOSUM30
Rank Model
1 QUARK_TS5
2 TASSER-VMT_TS5
3 myprotein-me_TS4
4 BAKER_ROSETTAS_TS3
5 Zhang-Server-TS1
QU_C1 + QUIB
Rank Model
1 QUARK_TS5
2 SAM-T08-server_TS3
3 myprotein-me_TS1
4 myprotein-me_TS4
5 TASSER-VMT-TS1
CC80+ BLOSUM30
Rank Model
1 BAKER_ROSETTAS_TS3
2 myprotein-me_TS4
3 QUARK_TS5
4 myprotein-me_TS1
5 RBO_Aleph_TS3
CCPC+ BLOSUM30
Rank Model
1 QUARK_TS5
2 BAKER_ROSETTAS_TS3
3 myprotein-me_TS4
4 myprotein-me_TS1
5 TASSER-VMT_TS5
QUIB
Rank Model
1 SAM-T08-server_TS3
2 myprotein-me_TS4
3 QUARK_TS5
4 BAKER_ROSETTAS_TS3
5 BAKER_ROSETTAS_TS2
T0799-D1 Kiharalab Top 1 Model
Native (Coordinates not available)Kiharalab_1
GDT-TS: 19.86 GOAP: -33178.17
BAKER-ROSETTASERVER_3
GDT-TS: 19.86 GOAP: -31360.77
3rd Best in Top 1 Models
T0834-D1 Kiharalab Top1 Model
3rd Best in Top 1 Models
Kiharalab_1
GDT-TS: 37.12 GOAP: -26474.14
RBO_ALeph_5
GDT-TS: 37.88 GOAP: -26865.67
Superimposition with the native (130-192) (D1 also includes 2-37)
Acknowledgements
http://kiharalab.org@kiharalab
Hyung-Rae Kim
Amit Roy
Lenna Peterson
Daisuke Kihara

Kihara Lab protein structure prediction performance in CASP11

  • 1.
    Hyung-Rae Kim, AmitRoy, Daisuke Kihara http://kiharalab.org 333
  • 2.
    Overall Prediction Procedure Server Models PRESCO Residue Environment Score BLOSUM30 CC80 CCPC QUIB QUC2 Rankingby AA matrices Structure Refinement by CHARMM runs Side-Chain Modeling 5 models Final 5 models 20-30 models CABS 5 Clusters Fragment- interaction -potential
  • 3.
    Add H Starting Structure Minimize in ScreenedCoulomb Potential (SCP) 10 x 10ns → 5000 snapshots constraints on secondary structure MD 20 x 10ns with SCP CH27 force field 10 x 10ns → 5000 snapshots constraints on all Ca atoms Dfire score, RMSD with initial structure Corr (dfire,Irmsd) > 0.4 Discarded (very rare) Average structure from low dfire, Irmsd snapshots from set 1 and 2 Relax at low T MD Model 1 and 2 Select structures with Lowest dfire score Model 3,4,5 Hassan, S. A., Guarnieri, F., & Mehler, E. L. (2000). The Journal of Physical Chemistry B, 104(27), 6478-6489. Mirjalili, V., Noyes, K., & Feig, M. (2014). Proteins: Structure, Function, and Bioinformatics, 82(S2), 196-207. Refinement Procedure
  • 4.
    Side-Chain Depth Environment(SDE) within a sphere of 6 or 8 Å along the main-chain Center (Kim & Kihara, Proteins 2014)
  • 5.
    Finding Similar SDEfrom Database Structure Database 2536 proteins 500 lowest RMSD fragments of 9 side-chain centroids; Superimposed with the query fragment Select SDE with the same number of side-chain centroids in the sphere of 8.0/6.0Å Query SDE Compute residue-depth RMSD for corresponding side-chain centroids Sort by depth RMSD to the query surface
  • 6.
    (Kim & Kihara,Proteins, 2014) Decoy Evaluation with Protein Residue Environmental Score (PRESCO)
  • 7.
    CCPC, CC80 Matrices: Contactdefinition of two residues: any pair of side- chain heavy atoms or Cα atom less than 4.5 Å Compute a knowledge-based residue contact potential (Gaussian chain reference state, composition correction averaging) Correlation coefficients of residue pairs are used as values of the amino acid similarity matrix Residue Contact Potential-Based Matrix (Tan, Huang, & Kihara, Proteins, 2006)
  • 8.
    Structure-derived Amino Acid SimilarityMatrices in AAIndex BLAJ010101 - Structural superposition data for identifying potential remote homologues (Blake-Cohen, 2001) HENS920101 - BLOSUM45 substitution matrix (Henikoff-Henikoff, 1992) JOHM930101 - Structure-based amino acid scoring table (Johnson-Overington, 1993) KOLA920101 - Conformational similarity weight matrix (Kolaskar-Kulkarni-Kale, 1992) KOSJ950115 - Context-dependent optimal substitution matrices for all residues (Koshi-Goldstein, 1995) MIYS930101 - Base-substitution-protein-stability matrix (Miyazawa-Jernigan, 1993) OVEJ920101 - STR matrix from structure-based alignments (Overington et al., 1992) PRLA000101 - Structure derived matrix (SDM) for alignment of distantly related sequences (Prlic et al., 2000) PRLA000102 - Homologous structure derived matrix for alignment of distantly related sequences (Prlic et al., 2000) QU_C930101 - Cross-correlation coefficients of preference factors main chain (Qu et al., 1993) QU_C930102 - Cross-correlation coefficients of preference factors side chain (Qu et al., 1993) QUIB020101 - STROMA score matrix for the alignment of known distant homologs (Qian-Goldstein, 2002)
  • 9.
    Alignment Accuracy byAA Matrices 2761 Fold level protein sequence pairs, Lindahl & Eloffson Database (Tan, Huang, Kihara, Proteins 2006) Correct alignments: >50% of residues are correctly aligned
  • 10.
    Native Structure Recognition 10 DecoySets DFIRE dDFIRE DOPE RW RWplus OPUS- PSP GOAP MRE (CC80) SDE (QUIB) Combinations of MRE & SDE # Targe tsBLSM3 0+QU_ C2 BLSM3 0+QU_ C2 CC80+ QU_C1 4state_redu ced 6 7 7 6 6 7 7 7 7 7 7 7 7 Fisa 3 3 3 3 3 3 3 2 2 2 2 3 4 Fisa_casp3 4 4 3 4 4 5 5 2 1 3 3 4 5 Lmds 7 6 7 7 7 8 7 10 6 10 10 10 10 Lattice_ssfit 8 8 8 8 8 8 8 8 8 8 8 8 8 hg_structal 12 16 ---- ---- 12 18 22 28 11 27 27 27 29 ig_structal 0 26 ---- ---- 0 20 47 61 6 61 61 61 61 ig_structal_ hires 0 16 ---- ---- 0 14 18 20 6 20 20 20 20 Moulder 19 18 19 19 19 19 19 20 16 20 20 20 20 ROSETTA 20 12 21 20 20 39 45 25 31 41 41 39 58 I-TASSER 49 48 30 53 56 55 45 56 47 56 56 56 56 #Total (Z-score) 128 (-1.94) 164 (-2.52) 98/168 (-2.47) 120/168 (-3.23) 135 (-2.13) 196 (-2.86) 226 (-3.57) 239 (6.78) 141 (2.14) 255 (5.70) 255 (5.76) 255 (5.65) 278
  • 11.
    Scoring Function Modelsonly Native included Average Rank Ranked 1 Average Rank ranked 1 MRE (CC80) 6.77 29 1.32 131 SDE (QUIB) 2.89 56 1.98 97 Combinations of MRE & SDE BLSM30+QU_C1 6.79 31 1.18 139 CC80(SDE)+BLSM 30(SDE) 2.82 66 1.99 89 QMEAN6 2.87 85 1.71 113 RWplus 2.97 57 1.78 106 RW 3.08 51 1.71 110 QMEANall_atom 3.59 74 1.71 119 QMEANSSE_agree 3.74 62 3.72 39 RF_HA_SRS 4.65 49 1.38 137 OPUS_CA 4.72 79 5.13 55 RF_HA 5.44 62 2.78 112 DOPE 5.77 54 3.27 95 DFIRE 6.03 50 5.69 33 Floudas-CM 7.75 38 7.05 42 Melo-ANOLEA 9.62 19 5.19 86 Random 9.72 13.9 10.1 8.3 Benchamark on Ryukumov & Fiser CASP Set Comparison against36 scoring functions. Only showing results of 13 functions. Best Second best Third best
  • 12.
  • 13.
    T0804 Top 1models Kiharalab: TS333_1 Boniecki_pred: TS301_1 Skwark: TS358_1
  • 14.
    T0804 Kiharalab Top1 Model Native (Coordinates not available) Kiharalab_1 GDT-TS: 31.44 GOAP: -18178.22 QUARK_5 GDT-TS: 30.93 GOAP: -14959.68 Best in Top 1 Models
  • 15.
    T0804 Server ModelsSelected by PRESCO Final Selection Rank Model GDT-TS 1 QUARK_TS5 30.93 2 myprotein-me_TS4 12.63 3 Zhang-server_TS5 29.77 4 Seok-server_TS2 11.86 5 BAKER_ROSETTASERVER_TS3 12.37 QU_C2 + BLOSUM30 Rank Model 1 QUARK_TS5 2 TASSER-VMT_TS5 3 myprotein-me_TS4 4 BAKER_ROSETTAS_TS3 5 Zhang-Server-TS1 QU_C1 + QUIB Rank Model 1 QUARK_TS5 2 SAM-T08-server_TS3 3 myprotein-me_TS1 4 myprotein-me_TS4 5 TASSER-VMT-TS1 CC80+ BLOSUM30 Rank Model 1 BAKER_ROSETTAS_TS3 2 myprotein-me_TS4 3 QUARK_TS5 4 myprotein-me_TS1 5 RBO_Aleph_TS3 CCPC+ BLOSUM30 Rank Model 1 QUARK_TS5 2 BAKER_ROSETTAS_TS3 3 myprotein-me_TS4 4 myprotein-me_TS1 5 TASSER-VMT_TS5 QUIB Rank Model 1 SAM-T08-server_TS3 2 myprotein-me_TS4 3 QUARK_TS5 4 BAKER_ROSETTAS_TS3 5 BAKER_ROSETTAS_TS2
  • 16.
    T0799-D1 Kiharalab Top1 Model Native (Coordinates not available)Kiharalab_1 GDT-TS: 19.86 GOAP: -33178.17 BAKER-ROSETTASERVER_3 GDT-TS: 19.86 GOAP: -31360.77 3rd Best in Top 1 Models
  • 17.
    T0834-D1 Kiharalab Top1Model 3rd Best in Top 1 Models Kiharalab_1 GDT-TS: 37.12 GOAP: -26474.14 RBO_ALeph_5 GDT-TS: 37.88 GOAP: -26865.67 Superimposition with the native (130-192) (D1 also includes 2-37)
  • 18.

Editor's Notes

  • #15 1st   QUARK_TS5 ** 2nd   myprotein-me_TS4 3rd   Zhang-server_TS5 ** 4th   Seok-server_TS2 5th   BAKER-ROSETTASERVER_TS3