Presentation at ISMB 2016 for the paper on Flexscore. Score for evaluating computational protein models by considering flexibility derived from NMR or molecular dynamics simluation. Paper published on Bioinformatics: http://www.ncbi.nlm.nih.gov/pubmed/27307633
by Kihara Lab http://kiharalab.org
Flexscore: Ensemble-based evaluation for protein Structure models
1. ENSEMBLE-BASED EVALUATION
FOR PROTEIN STRUCTURE
MODELS
Michal Jamroz1, Andrzej Kolinski1, &
Daisuke Kihara2
1 Faculty of Chemistry, Warsaw University, Poland
2 Department of Biological Sciences/Computer Science,
Purdue University, USA
1
http://kiharalab.org
2. Protein Structure Comparison
• Superimposition of two structures considering the
structures are rigid
• Root mean square deviation (RMSD)
2
2
1
1),(
N
i
B
i
A
i xx
N
BArmsd
• CE, Dali, SSAP, 3D-SURFER (http://kiharalab.org/3d-surfer)
• In protein structure prediction, structure comparison
important in evaluating structure models
• GDT-TS, TM-Score
• Rigid structure comparison is due to the static pictures
provided by crystal structures of proteins in PDB
3. But protein is intrinsically flexible!
• Flexibility can be measured/observed by
• NMR
• Molecular dynamics (MD) simulation
• Coarse-grained model simulation, e.g. Gaussian Network model
• Even diffraction data from X-ray crystallography contains
flexibility information beyond single isotropic B-factor
model (Blundell, 2004; Terwilliger, 2006, ,,,)
• Intrinsic disordered proteins
3
(Madl et al, JMB 2006; CcdA, NMR) (10 nano sec. MD, PDB ID: 2n2u)
4. Protein Structure Comparison Methods
that Consider Chain Flexibility
• Weighted RMSD using B-factors (Wu & Wu, 2010)
• iterative RMSD computation (Damm &Carlson, 2006)
• Use of elastic network model (FlexE, Perez et al., 2012)
• Use of structural ensembles
• KL divergence of two ensembles (L-Larsen et al. 2009)
• Maximum Likelihood (THESEUS, 2006; bFit, 2010)
4
5. FlexScore
(Jamroz, Kolinski, & Kihara, ISMB, Bioinformatics, 2016)
• Evaluating a computational protein structure model by
comparing it to an ensemble of the target protein structure
• The ensemble comes from either NMR, MD simulation (or
else)
• 10 nano seconds MD simulation with explicit water molecules
• Structure Xi in an ensemble X is represented as
5
T
ik
T
iii tREMX 1
M: a mean structure
Ei: displacement that follows a Gaussian distribution of Nk,3(0, S, I3), S is a k x k
covariance matrix
K: the number of Ca atoms
Ri: rotation matrix
ti: translation vector, 1k is a k x 1 vector
T denotes transpose of a matrix
6. Ensemble Superimposition
6
T
ik
T
iii tREMX 1Ensemble
structures, X
k
T
k
k
T
i
i
X
t
11
1ˆ 1
1
Estimate t^
Estimate R^
Initialization:
S=I, M= Xj, a = 0^^
Estimate M
^ T
iki
T
tXM ˆ11
R computed
by SVD of
T
i
T
iki
n
i
i
T
ikis MRtXMRtX
n
ˆˆ1ˆˆ1
3
1ˆ
1
sh I
nn
n ˆ
3
2
33
3ˆ a
mk
i
ism cmE
k
1
11
,,|2
ˆ
a
a
Estimate a^
Estimate Ss, and its
Ls (eigenvector)^
^
Estimate Sh, Lh
^^
a |)|,,,( lXMtRl
Hierarchical log likelihood model
a: parameter of inverse
Gamma distribution which L
of S follows
(Theobald DL, 2012)
7. FlexScore (FS)
• Score of a computational model Y by shifting t and rotating
with a rotation matrix by SVD of
• Score of 0 for the perfect model
• FS-GDT: defined as the average of factions of Ca atoms
within FlexScore of 1, 2, 4, and 8. The score ranges [0, 1].
(analogous to GDT-TS, which is the average of fractions of Ca atoms
within 1, 2, 4, and 8 Å)
7
k
i
ii
i
YM
k
YFS
1
supˆ11
)(
T
ikh
T
tYM ˆ1ˆ 1
8. FlexScore of Toy models
NMR Structures (PDB ID: 2j8p)
Identical RMSD, GDT-TS, & TM-Score: 1.47, 0.95, & 0.93 to the mean structure
FlexScore: Green, 1.96; Blue: 1.42
8
13. Different Evaluation by FlexScore, GDT-
TS, TM-Score, & RMSD (T0714)
13
Green and orange model
GDT-TS: 0.84, 0.83; TM-Score: 0.83, 0.86
FlexScore: 4.42, 2.69
15. Dependency to Length of MD Simulation
15
T0773, PDB ID: 2n2u, 77aa long.
Left half, Correlation with the other scores; right half, average values of the scores.
16. FlexScore from NMR and MD Ensembles
16
Scores of 235 models of T0176 are compared.
17. CASP10 Prediction Group Ranking
17
Rank FS FS-GDT GDT-TS TM RMSD
1 A A A A A
2 B D B B B
3 C B F C C
4 D C C F F
5 E F D D E
6 F E I I G
7 G O (14) G X (24) J
8 H J J L (12) I
9 I Q (17) E G D
10 J H O (14) Q (17) H
19. Structural Features Avg. corr. coefficient
B-Factor 0.484
Distance to center of mass 0.509
Square of distance to center of mass
(D2)
0.545
Contact number (cutoff 6 Å) -0.374
Contact number (8 Å) -0.480
Contact number (12 Å) -0.554
Contact number (15 Å) -0.568
Contact number (16 Å) -0.567
Contact number (18 Å) -0.562
Accessible Surface Area normalized 0.476
Residue depth (residue mean) -0.352
Prediction by GNM (cutoff 16 Å) 0.643
Prediction by GNM (no cutoff) 0.646
19
(592 MD trajectories
from the MoDEL db)
20. Fluctuation Prediction Using Support
Vector Regression
20
Features used Average
corr. coeff.
RMS (Å)
B, D2, Sec, C(16), C(18), C(12), C(8) 0.667 1.042
B, D2, C(16), C(18), C(12), C(8), C(6), C(20) 0.666 1.042
B, D2, C(16), C(18), C(12), C(8), C(6), C(20), C(22) 0.667 1.042
B, C(16), C(18), C(12), C(8), C(6), C(20), C(22) 0.669 1.073
C(16), C(18), C(12), C(8), C(6), C(15) C(20), C(22) 0.660 1.092
B, B-factor; D2, square of the distance to the center of mass;
C(x), the contact number with x Å cutoff
(Jamroz, Kolisnki, Kihara, Proteins 80: 1425-1435, 2012)
22. Summary
• Developed FlexScore, which evaluates computational
protein structure models by considering flexibility of
target proteins
• Flexibility is represented by a structure ensemble, which
come from MD or NMR, or prediction using FlexPred
• Distinguishes discrepancy of a model at a flexible
region and a rigid region of the target protein
• Overall correlates well with existing scores (GDT-TS,
TM-Score), but occasionally have different, more
reasonable evaluation
22
Available at
FlexScore: https://bitbucket.org/mjamroz/flexscore
FlexPred: http://kiharalab.org/flexpred/