This document summarizes Daisuke Kihara's protein docking prediction methods used in CAPRI Rounds 30 and 31. It describes the LZerD docking program, 3D Zernike descriptors for representation, and scoring functions like ITScore, GOAP and DFIRE. Interface prediction methods like BindML and cons-PPISP are also overviewed. Examples of targets T79, T91 and T96 are provided to illustrate model quality, interface prediction performance, and how different scoring functions evaluated models. The summary highlights that further improvement is needed for interface prediction and scoring functions' dependence on subunit model quality.
1. Human and Server CAPRI Protein Docking
Prediction Using LZerD with Combined
Scoring Functions
Daisuke Kihara
Department of Biological Sciences
Department of Computer Science
Purdue University, Indiana, USA
1
http://kiharalab.org
2. CAPRI Round 30 Results
2(Lensink et al., CAPRI30 group paper, 2016)
3. Overview of Protein Docking
Prediction Using LZerD in CAPRI
3
Re-ranking with
scoring functions
HHPred SparksX
MUFold
TASSER
Phyre2
TASSERlite
MultiCom
Single Chain
Modeling
PRESCO
Sub-unit models
LZerD
~50,000
docking models
Clustering,
RMSD < 5 Å
10 models
MD relaxation Submit
4. LZerD(Local 3D Zernike descriptor-based Docking program)
4
normal
vector
3DZernike
descriptor
6Å
Interface area
(Venkatraman, Yang, Sael, & Kihara,
BMC Bioinformatics, 2009)
(Lizard)
5. 3D Zernike Descriptors (3DZD)
An extension of
spherical harmonics
based descriptors
A 3D object can be
represented by a
series of orthogonal
functions, thus
practically
represented by a
series of coefficients
as a feature vector
Compact
Rotation invariant
5
A surface representation of 1ew0A (A) is reconstructed from its 3D Zernike
invariants of the order 5, 10, 15, 20, and 25 (B-F). (Sael & Kihara, 2009)
),()(),,( ϕϑϕϑ m
lnl
m
nl YrRrZ =
),( ϕϑm
lY )(rRnl
),,( ϕϑrZ m
nl
: Spherical harmonics, : radial functions
polynomials in Cartesian coordinates
∫ ≤
=Ω
14
3
.)()(
x
xxx dZf m
nl
m
nl πZernike moments:
Zernike Descriptor:
2
)( m
nl
lm
lm
nlF Ω= ∑
=
−=
6. Protein Residue Environment SCOre
(PRESCO)
6
within a sphere of 6 or 8 Å
along the main-chain
Center
(Kim & Kihara, Proteins 2014)
7. Finding Similar Side-Chain Depth
Environment (SDE) from a database
7
Structure
Database
2536 proteins
500 lowest
RMSD
fragments of 9
side-chain
centroids;
Superimposed
with the query
fragment
Select SDE
with the same
number of
side-chain
centroids in
the sphere of
8.0Å
Query SDE
Compute RMSD
of residue-
depth for
corresponding
side-chain
centroids
Sort by depth RMSD
to the query
surface
30. T91 PRESCO Scores
Without Interface PredictionDocking with Zhang models
PRESCO PRESCO
LRMSD
Top 5 models selected from each
30
31. T91 Score Performance Summary
Run Score RFH Hits in top 10
nointerface ITScore 2 2
nointerface GOAP 2 1
nointerface DFIRE 1 2
interface ITScore 1042 0
interface GOAP 165 0
interface DFIRE 116 0
zhang1 ITScore 1 (4) 5
zhang1 GOAP 2 (16) 5
zhang1 DFIRE 1 (6) 6
RFH: rank of first acceptable (medium) hit
31
32. T96 (Round 31)
Heterodimer
Predictor hits: 0 (5 by other groups)
Scorer hits: human 1, server 0 (1 by other
group)
Human: 6 selected by PRESCO, 4 selected from
with predicted interface, ITScore, GOAP, DFIRE
No PDB file for the native structure available:
metrics computed using two scorer hits
(average L-RMSD/I-RMSD, max fnat)
32
33. T96 scorer hits
Chain B
S39.M03 (Haliloglu)
fnat 0.22
L-RMSD 5.68 Å
I-RMSD 2.44 Å
Chain A
Chain B
S31.M06 (Kihara)
fnat 0.32
L-RMSD 7.99 Å
I-RMSD 2.67 Å
33
34. T96 interface prediction
Chain Method Precision Recall F-score
A BindML 0.15 0.2 0.17
Cons-PPISP 0 0 NA
B BindML 0.12 0.11 0.12
Cons-PPISP* NA NA NA
*Cons-PPISP predictions were only for the N-terminal tail; visual
inspection suggests that N-terminal tail is not a likely a binding site, so
these predictions were not used.
34
36. T96 Score Performance Summary
Score RFH Hits in top 10
ITScore 529 0
GOAP 6 1
DFIRE 125 0
RFH: rank of first acceptable hit
• The hit for GOAP/DFIRE is the same model picked by PRESCO
36
37. Summary
Our docking prediction procedure runs LZerD,
and decoys were selected by combining DFIRE,
ITScore, GOAP, and PRESCO. Binding sites were
predicted by BindML and cons-PPISP.
On the examples shown, PRESCO’s performance
was not as spectacular as we expected from its
performance on single chain str. prediction.
DFIRE, ITScore, GOAP showed similar,
reasonably good performance.
Scoring functions performance depends on
subunit model quality.
The way to use BindML prediction needs to be
improved. 37
nointerface: our model, no interface restrictioninterface: our model, interface restriction
zhang1: zhang1 model, no interface restriction
For the server pick, 10 picks are from &quot;nointerface&quot; (our model). The server models were not released until after the server prediction was due.
For the human pick, the 5 using our model and the 5 using zhang1 model are both without interface restriction.
We had noticed systematically worse energy scores for interface restricted docking. Based on my analysis of BindML predictions, I think this can be partially mitigated by using a permissive BindML prediction.
For T96 scorer round, we had one hit for human (out of 2 total hits), and it was picked by PRESCO. Unfortunately, when I evaluated the scorer models using the best scorer model, our model is categorized as &quot;incorrect&quot; (it fails by all cutoffs, fnat, lrmsd, and irmsd).
This complex is between an arc-shaped alpha repeat protein (chain A) and eGFP (chain B). Visual inspection shows that in our pick vs the other hit, eGFP has rolled a small amount relative to the alpha repeat protein, so I would guess that the true position is somewhere in between the two hits. I could make an approximate L-RMSD for all the scorer models using the average of the L-RMSD against our hit and the other hit, but there is no obvious way to compute an average fnat.
The other problem is that for t96 and t97 scorer, Kim did not send the numerical scores, he just sent a list of the top 6 models.