Structural interaction fingerprint1 (IFP) was introduced in order to overcome the shortcomings of the existing scoring functions. IFP represent a binary string that encoding a presence or an absence of interactions of a ligand with amino acids of a protein binding site. It is a convenient way to compare and analyze binding poses of the ligands.
Interaction fingerprint: 1D representation of 3D protein-ligand complexes
1. Interaction fingerprints
1NTERACT10N
F1NGERPR1NTS
Chupakhin Vladimir
Laboratory of Chemoinformatics
Structural Chemogenomics Group
University of Strasbourg
December 2011
1
Vladimir Chupakhin, UNISTRA, 2011
4. Molecular docking: main steps
1. Protein and ligand preparation
2. Binding site identification
3. Conformational search with scoring of the generated
poses
4
Vladimir Chupakhin, UNISTRA, 2011
6. Different type of interactions
- Hydrophobic
- H-bonds
- Ionic
- Aromatic
- Cation-π
7. Self-docking
Dock to the
Modify geometry same protein
Extract ligand Extract ligand
Blue
Red 4.3Å
1.1Å
Orange
Calculate RMSD
7
Vladimir Chupakhin, UNISTRA, 2011
8. Docking quality: RMSD
δ1
δ is the distance
between N pairs of
equivalent atoms
δN
8
Vladimir Chupakhin, UNISTRA, 2011
9. Cross-docking
Procedures are the
same. But why?
Robustness!!!
These fluctuation have
huge influence in the
docking results 9
Vladimir Chupakhin, UNISTRA, 2011
11. Force-field scoring function
Algorithm (force field based)
For a given PL complex
1. Calculate the interaction energies
between atoms of the ligand and
protein (EvdW + EH-bond) using force
field.
2. Calculate internal energy of the
ligand (Ewdw + Etorsion) + internal H-
bond of the ligand (optionally).
3. Total energy = sum of the energy
terms 2 and 3
Protein-ligand interactions energy terms Ligand energy terms
11
DOI:10.1038/nrd1549 Vladimir Chupakhin, UNISTRA, 2011
12. Empirical scoring function
Algorithm (additive scheme)
1. Define interactions types and
geometries
2. Look up at the database of
interaction energies
3. Total energy = Sum of the
contribution of the every
component (+ geometry term
influence)
ESF made to reproduce the binding energies or
conformations (scoring function depends on
the training set used to developed it)
LUDI
12
DOI:10.1038/nrd1549 Vladimir Chupakhin, UNISTRA, 2011
13. Knowledge-based scoring function
Algorithm
1. Define interactions types and
geometries
2. Look up into the database of LP
atom interactions
3. Total score (energy) = Sum of the
interactions scores (energies)
(ϒ – adjustable parameter, SAS0 – solvated state
of the solvent accessible ares)
KBSF developed to reproduce the
binding pose then energy
13
DOI:10.1038/nrd1549 Vladimir Chupakhin, UNISTRA, 2011
14. Scoring functions: the purposes
Docking = finding Scoring = predict activity
the correct binding of the compound (Ki,
pose IC50, etc)
14
Vladimir Chupakhin, UNISTRA, 2011
15. Scoring functions: docking
Docking
Average success to dock
compound within RMSD <
2Å is around 70%
Comparative Assessment of Scoring Functions on a Diverse Test Set, Wang, 2009
15
Vladimir Chupakhin, UNISTRA, 2011
16. Scoring functions: scoring
Scoring
Average success rate to rank
compound with correlation
coefficient from 55-64%
Comparative Assessment of Scoring Functions on a Diverse Test Set, Wang, 2009 16
Vladimir Chupakhin, UNISTRA, 2011
17. GOLD Score failure
pose1 pose2
GOLD Score 59,19 59,30
RMSD, Å 1,10 4,27
pose1 Top scored pose
pose2
17
Vladimir Chupakhin, UNISTRA, 2011
18. Molecular scoring functions: problems
1.Problems when binding site is highly
charged or highly hydrophobic/
hydrophilic
2.Problems when binging site contains
waters, ions, cofactors
3. Fragment-like docking – is very tricky
4. Even input conformation can influence
the docking results
18
Vladimir Chupakhin, UNISTRA, 2011
20. Chemical fingerprint
Fingerprints encode the presence or absence of certain features in a
compound, e.g., fragments.
0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0
KISS: Keep It Short and
Simple! Keep It Simple Stupid
21. Structural Interaction Fingerprints
Detect
interactions
of the ligand
with every
amino acid
of the binding
site
Zhan Deng, Claudio Chuaqui, and Juswinder Singh Structural Interaction Fingerprint (SIFt): A Novel Method for Analyzing
Three-Dimensional Protein−Ligand Binding Interactions (DOI: 10.1021/jm030331x), Biogen Inc.
22. Interaction Fingerprints : preparation
Aromatic H-bond Ionic (protein
Hydrophobic face to edge (protein acceptor) anion)
Aromatic H-bond Ionic (protein
face to face (protein donor) cation)
1 0 0 0 1 0 0
Bitstring for 1 residue
100100010000101000000100000010000001 …..
Residue 1 Residue 2 Residue 3 Residue 4 Residue 5 Residue X
Bitstring for the whole binding site – Interaction Fingerprint
2007, Optimizing Fragment and Scaffold Docking by Use of Molecular Interaction Fingerprints 22
24. Parameters of IFP
interacting patterns (amino acid can be
represented as residue or an
pharmacophoric point, interacting
fragment of ligand can be encoded as
atom, fragment or pharmacophoric point);
type of interaction (hydrogen bonds,
hydrophobic interactions, etc);
direction of interaction (this parameter
distinguish the direction of interaction: for
example is donor of hydrogen bond protein
or ligand);
strength of interaction and distance
between interacting patterns (these
parameters are research specific);
number of bits per interaction point (one
or many).
Ligand ↔ Receptor 24
26. IFP usage
• store interactions in useful format
• analyze experimental LR-complexes
• quality of docking studies
• results clustering (even peptides and PPI)
• analyze docked LR-complexes (drug-like and
fragment-like compounds)
• retrieve correct binding pose
• retrieve specific binding pose
Vladimir Chupakhin, UNISTRA, 2011
27. Use cases for IFP: storage
Useful way to store interaction information from
experimentally derived LR-complexes:
• scPDB database – Laboratory of Didier Rognan,
UNISTRA, Illkirch (DOI: 10.1021/ci050372x)
• CREDO database (DOI:10.1111/j.1747-0285.2008.00762.x).
27
28. Use cases for IFP: x-ray LR analysis
Binding site
Compounds
Specific interactions
DOI: 10.1021/jm030331x
Vladimir Chupakhin, UNISTRA, 2011
29. Use cases for IFP: pose retrieval (1)
RMSD is not 100%
correct evaluation
function!
DOI: 10.1021/ci600342e
Vladimir Chupakhin, UNISTRA, 2011
30. Use cases for IFP: VS
Compare the reference x-ray IFP
with IFP of docked poses using
Compounds Tanimoto coefficient.
database
Virtual
screening
results
Using standard SF: X% of the real hits
Using standard SF + TC: X% + up to 20%
Vladimir Chupakhin, UNISTRA, 2011
31. Use cases for IFP: PPI
IFP suitable even for analysis of Protein-Protein
Interactions!
Vladimir Chupakhin, UNISTRA, 2011
32. Use cases for IFP: agonists/antagonists
(A) Procaterol – agonist, (B) Carvediol - antagonist
Selective Structure-Based Virtual Screening for Full and Partial Agonists of the b2
Adrenergic Receptor, DOI: 10.1021/jm800710x
Vladimir Chupakhin, UNISTRA, 2011
34. IFP modifications: r-SIFt – R-group IFP
LEU83 110
1001000
C R1R2
Benefits: Combinatorial
Independent of interaction type! library analysis
Just the fact of interaction! (~100.000 compounds)
DOI: 10.1021/jm050381x
35. IFP modifications: w-SIFt – weighed IFP
Biological
+ Activity
+
Machine learning
less moderate most approach: find
active activity active
correlation between bit
frequency and activity
Benefits:
• help to find what interactions are critical
for compound potency
• interpretable position dependent scoring
function for ligand protein interactions
DOI: 10.1021/ci800466n
37. BS-independent IFP: APIF
APIF: A New Interaction Fingerprint Based on Atom Pairs and Its Application to Virtual
Screening
Distance =
range Quadruplet
IFP
Atom Pair
Algorithm
1. Detect interaction patterns (Hydrophobic,
HBA, HBD)
2. Define distance1 and distance2 for
quadruplet interaction
3. Convert distances to distance range
4. Map distance range and types ….
38. BS-independent IFP: APIF - Quadruplet
Distance 2
Ligand-atom Ligand-atom
Interaction Interaction
Protein-atom Protein-atom
Distance 1
1 bit in the APIF
40. BS-independent IFP: Pharm-IF
Algorithm
1. Detect interaction patterns (Hydrophobic, Benefits:
HBA, HBD) • independent on the
2. Define ligand pairs based on ligand atoms
interacting with protein ONLY binding site
3. Measure their distance • comparable to current
4. Map distance to range (quantization) = scoring functions
Pharm-IF
DOI: 10.1021/ci900382e
42. IFP-based SF: AuPosSOM
Automatic clustering of docking poses in virtual screening
process using self-organizing map - AuPosSOM
• Dock decoys and compounds
with known activity
• Generate vector of
interactions (H-bons,
hydroph.interactions)
• Train model of the active and
incative (vector is input)*
f (Input (IFP) = 1 or 0
where
1 – is binder
0 – non binder
*Simplified representation
42
Vladimir Chupakhin, UNISTRA, 2011
43. IFP-based SF: RF-Score
A machine learning approach to predicting protein–ligand
binding affinity with applications to molecular docking – RF-
Score DOI:10.1093/bioinformatics/btq112
• Vector of 36 features, each feature is occurrence count for j-i
atom pair
• Mechanism of generations: take all atoms around 12A around
selected ligand atom, filter out interaction out of cutoff range,
sum the result (for each interaction pair).
• PDBBind was used to train Random Forest model
• Train model using activity as output and interactions as input
43
Vladimir Chupakhin, UNISTRA, 2011
44. Literature overview: SVM-SP
Support Vector Regression Scoring of Receptor–Ligand Complexes
for Rank-Ordering and Virtual Screening of Chemical Libraries
DOI: 10.1021/ci200078f
• Two types of vectors: SVR-KB (146
features) are knowledge-based pairwise
potentials (same as above mentioned
but trained with SVR), while SVR-EP is
based on physico-chemical properties.
SVR-EP vector consist of features
extracted from X-score (polar/unpolar
SASA, MW, vdW energy, etc)
• SVR-KB is better then SVR-EP
Vector is unique!
Vector is atom pair based
44
Vladimir Chupakhin, UNISTRA, 2011