Interaction fingerprints    1NTERACT10N    F1NGERPR1NTSChupakhin VladimirLaboratory of ChemoinformaticsStructural Chemogen...
Virtual screening approaches        ?                Ligand –based                 (QSAR, similarity search,              ...
Lock-and-key paradigm                 Interactions LockKey                                                     3          ...
Molecular docking: main steps1. Protein and ligand preparation2. Binding site identification3. Conformational search with ...
Geometry of interactionH-bond angle (~175°)H-bond length (3.0 Å)   Interactions                        are                ...
Different type of interactions                     - Hydrophobic                     - H-bonds                     - Ionic...
Self-docking                                Dock to the        Modify geometry         same proteinExtract ligand         ...
Docking quality: RMSD                                     δ1δ is the distancebetween N pairs ofequivalent atoms           ...
Cross-docking                         Procedures are the                         same. But why?                         Ro...
Scoring functions1. Force-field scoring functions (Dock, AutoDock, GOLD)2. Empirical scoring functions (ChemScore, PLP, Gl...
Force-field scoring function                                             Algorithm (force field based)                    ...
Empirical scoring function                                 Algorithm (additive scheme)                                 1. ...
Knowledge-based scoring function Algorithm 1. Define interactions types and     geometries 2. Look up into the database of...
Scoring functions: the purposes Docking = finding    Scoring = predict activitythe correct binding     of the compound (Ki...
Scoring functions: docking                                    Docking                                    Average success t...
Scoring functions: scoring                                      Scoring                                      Average succe...
GOLD Score failure                          pose1 pose2               GOLD Score 59,19 59,30                RMSD, Å    1,1...
Molecular scoring functions: problems1.Problems when binding site is highly  charged or highly hydrophobic/  hydrophilic2....
Interactionfingerprints                                    19         Vladimir Chupakhin, UNISTRA, 2011
Chemical fingerprintFingerprints encode the presence or absence of certain features in acompound, e.g., fragments.        ...
Structural Interaction Fingerprints                                                                                       ...
Interaction Fingerprints : preparation                                Aromatic                   H-bond                   ...
Molecular Interaction Fingerprints ~ (IFP)                 ILE10        1000000                 VAL18        1000000      ...
Parameters of IFP          interacting patterns (amino acid can be           represented      as     residue     or     a...
Gold scoring function failure: IFP wins!                                                            Pose 1 – orange       ...
IFP usage• store interactions in useful format• analyze experimental LR-complexes    • quality of docking studies    • res...
Use cases for IFP: storageUseful way to store interaction information fromexperimentally derived LR-complexes:• scPDB data...
Use cases for IFP: x-ray LR analysis Binding site                                              Compounds            Specif...
Use cases for IFP: pose retrieval (1)                                RMSD is not 100%                                corre...
Use cases for IFP: VS                   Compare the reference x-ray IFP                   with IFP of docked poses usingCo...
Use cases for IFP: PPIIFP suitable even for analysis of Protein-ProteinInteractions!                                 Vladi...
Use cases for IFP: agonists/antagonists       (A) Procaterol – agonist, (B) Carvediol - antagonistSelective Structure-Base...
IFP modificationsIFP modifications
IFP modifications: r-SIFt – R-group IFPLEU83                    1101001000                         C R1R2                 ...
IFP modifications: w-SIFt – weighed IFP                                           Biological                              ...
Binding site independent IFPBinding site independent IFP
BS-independent IFP: APIFAPIF: A New Interaction Fingerprint Based on Atom Pairs and Its Application to VirtualScreening   ...
BS-independent IFP: APIF - Quadruplet                              Distance 2              Ligand-atom                  Li...
BS-independent IFP: APIFBenefits:• independent on the binding site• comparable to current scoring functions
BS-independent IFP: Pharm-IFAlgorithm1. Detect interaction patterns (Hydrophobic,   Benefits:   HBA, HBD)                 ...
IFP-based scoring functionsIFP-based scoring functions
IFP-based SF: AuPosSOMAutomatic clustering of docking poses in virtual screening   process using self-organizing map - AuP...
IFP-based SF: RF-Score    A machine learning approach to predicting protein–ligand    binding affinity with applications t...
Literature overview: SVM-SPSupport Vector Regression Scoring of Receptor–Ligand Complexesfor Rank-Ordering and Virtual Scr...
Merci bien!Thanks a lot!
Upcoming SlideShare
Loading in …5
×

Interaction fingerprint: 1D representation of 3D protein-ligand complexes

5,974 views

Published on

Structural interaction fingerprint1 (IFP) was introduced in order to overcome the shortcomings of the existing scoring functions. IFP represent a binary string that encoding a presence or an absence of interactions of a ligand with amino acids of a protein binding site. It is a convenient way to compare and analyze binding poses of the ligands.

Published in: Education, Technology

Interaction fingerprint: 1D representation of 3D protein-ligand complexes

  1. 1. Interaction fingerprints 1NTERACT10N F1NGERPR1NTSChupakhin VladimirLaboratory of ChemoinformaticsStructural Chemogenomics GroupUniversity of StrasbourgDecember 2011 1 Vladimir Chupakhin, UNISTRA, 2011
  2. 2. Virtual screening approaches ? Ligand –based (QSAR, similarity search, pharmacophores)Structure–based(docking, pharmacophores) Vladimir Chupakhin, UNISTRA, 2011
  3. 3. Lock-and-key paradigm Interactions LockKey 3 Vladimir Chupakhin, UNISTRA, 2011
  4. 4. Molecular docking: main steps1. Protein and ligand preparation2. Binding site identification3. Conformational search with scoring of the generated poses 4 Vladimir Chupakhin, UNISTRA, 2011
  5. 5. Geometry of interactionH-bond angle (~175°)H-bond length (3.0 Å) Interactions are geometry!
  6. 6. Different type of interactions - Hydrophobic - H-bonds - Ionic - Aromatic - Cation-π
  7. 7. Self-docking Dock to the Modify geometry same proteinExtract ligand Extract ligand Blue Red 4.3Å 1.1Å Orange Calculate RMSD 7 Vladimir Chupakhin, UNISTRA, 2011
  8. 8. Docking quality: RMSD δ1δ is the distancebetween N pairs ofequivalent atoms δN 8 Vladimir Chupakhin, UNISTRA, 2011
  9. 9. Cross-docking Procedures are the same. But why? Robustness!!!These fluctuation havehuge influence in thedocking results 9 Vladimir Chupakhin, UNISTRA, 2011
  10. 10. Scoring functions1. Force-field scoring functions (Dock, AutoDock, GOLD)2. Empirical scoring functions (ChemScore, PLP, Glide SP/XP)3. Knowledge-based scoring functions (PMF, DrugScore, ASP, SMoG) Ligand Protein atoms atoms 10 Vladimir Chupakhin, UNISTRA, 2011
  11. 11. Force-field scoring function Algorithm (force field based) For a given PL complex 1. Calculate the interaction energies between atoms of the ligand and protein (EvdW + EH-bond) using force field. 2. Calculate internal energy of the ligand (Ewdw + Etorsion) + internal H- bond of the ligand (optionally). 3. Total energy = sum of the energy terms 2 and 3 Protein-ligand interactions energy terms Ligand energy terms 11DOI:10.1038/nrd1549 Vladimir Chupakhin, UNISTRA, 2011
  12. 12. Empirical scoring function Algorithm (additive scheme) 1. Define interactions types and geometries 2. Look up at the database of interaction energies 3. Total energy = Sum of the contribution of the every component (+ geometry term influence) ESF made to reproduce the binding energies or conformations (scoring function depends on the training set used to developed it)LUDI 12 DOI:10.1038/nrd1549 Vladimir Chupakhin, UNISTRA, 2011
  13. 13. Knowledge-based scoring function Algorithm 1. Define interactions types and geometries 2. Look up into the database of LP atom interactions 3. Total score (energy) = Sum of the interactions scores (energies) (ϒ – adjustable parameter, SAS0 – solvated state of the solvent accessible ares) KBSF developed to reproduce the binding pose then energy 13 DOI:10.1038/nrd1549 Vladimir Chupakhin, UNISTRA, 2011
  14. 14. Scoring functions: the purposes Docking = finding Scoring = predict activitythe correct binding of the compound (Ki, pose IC50, etc) 14 Vladimir Chupakhin, UNISTRA, 2011
  15. 15. Scoring functions: docking Docking Average success to dock compound within RMSD < 2Å is around 70%Comparative Assessment of Scoring Functions on a Diverse Test Set, Wang, 2009 15 Vladimir Chupakhin, UNISTRA, 2011
  16. 16. Scoring functions: scoring Scoring Average success rate to rank compound with correlation coefficient from 55-64%Comparative Assessment of Scoring Functions on a Diverse Test Set, Wang, 2009 16 Vladimir Chupakhin, UNISTRA, 2011
  17. 17. GOLD Score failure pose1 pose2 GOLD Score 59,19 59,30 RMSD, Å 1,10 4,27 pose1 Top scored pose pose2 17 Vladimir Chupakhin, UNISTRA, 2011
  18. 18. Molecular scoring functions: problems1.Problems when binding site is highly charged or highly hydrophobic/ hydrophilic2.Problems when binging site contains waters, ions, cofactors3. Fragment-like docking – is very tricky4. Even input conformation can influence the docking results 18 Vladimir Chupakhin, UNISTRA, 2011
  19. 19. Interactionfingerprints 19 Vladimir Chupakhin, UNISTRA, 2011
  20. 20. Chemical fingerprintFingerprints encode the presence or absence of certain features in acompound, e.g., fragments. 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 KISS: Keep It Short and Simple! Keep It Simple Stupid
  21. 21. Structural Interaction Fingerprints Detect interactions of the ligand with every amino acid of the binding siteZhan Deng, Claudio Chuaqui, and Juswinder Singh Structural Interaction Fingerprint (SIFt): A Novel Method for AnalyzingThree-Dimensional Protein−Ligand Binding Interactions (DOI: 10.1021/jm030331x), Biogen Inc.
  22. 22. Interaction Fingerprints : preparation Aromatic H-bond Ionic (protein Hydrophobic face to edge (protein acceptor) anion) Aromatic H-bond Ionic (protein face to face (protein donor) cation) 1 0 0 0 1 0 0 Bitstring for 1 residue 100100010000101000000100000010000001 ….. Residue 1 Residue 2 Residue 3 Residue 4 Residue 5 Residue X Bitstring for the whole binding site – Interaction Fingerprint2007, Optimizing Fragment and Scaffold Docking by Use of Molecular Interaction Fingerprints 22
  23. 23. Molecular Interaction Fingerprints ~ (IFP) ILE10 1000000 VAL18 1000000 ALA31 1000000 LYS33 1000000 VAL64 1000000 PHE80 1010000 GLU81 0000100 PHE82 1100000 LEU83 1001000 HIS84 1000000 GLN85 1000000 ASP86 1000101 LEU134 1000000 ALA144 1000000 ASP145 1000000 3D 1D (bit string) 1000000100000010000001000000100000010000001000101100000010000001000000Zhan Deng, Claudio Chuaqui, and Juswinder Singh Structural Interaction Fingerprint (SIFt): A Novel Method for AnalyzingThree-Dimensional Protein−Ligand Binding Interactions (DOI: 10.1021/jm030331x), Biogen Inc. 23 Vladimir Chupakhin, UNISTRA, 2011
  24. 24. Parameters of IFP  interacting patterns (amino acid can be represented as residue or an pharmacophoric point, interacting fragment of ligand can be encoded as atom, fragment or pharmacophoric point);  type of interaction (hydrogen bonds, hydrophobic interactions, etc);  direction of interaction (this parameter distinguish the direction of interaction: for example is donor of hydrogen bond protein or ligand);  strength of interaction and distance between interacting patterns (these parameters are research specific);  number of bits per interaction point (one or many). Ligand ↔ Receptor 24
  25. 25. Gold scoring function failure: IFP wins! Pose 1 – orange (TCreal_vs_docked – 0.75 RMSD – 1.10 Å, Goldscore = 59.20) Pose 2 – blue (TCreal_vs_docked – 0.52 RMSD – 4.27 Å, Goldscore = 59.30) X-ray pose – brickredLigand A07 from LR-complex (PDB ID: 3LFS), docked into CDK2 binding site (PDB ID: 2A0C). Jaccard (Tanimoto) coefficient 25 Vladimir Chupakhin, UNISTRA, 2011
  26. 26. IFP usage• store interactions in useful format• analyze experimental LR-complexes • quality of docking studies • results clustering (even peptides and PPI)• analyze docked LR-complexes (drug-like andfragment-like compounds) • retrieve correct binding pose • retrieve specific binding pose Vladimir Chupakhin, UNISTRA, 2011
  27. 27. Use cases for IFP: storageUseful way to store interaction information fromexperimentally derived LR-complexes:• scPDB database – Laboratory of Didier Rognan,UNISTRA, Illkirch (DOI: 10.1021/ci050372x)• CREDO database (DOI:10.1111/j.1747-0285.2008.00762.x). 27
  28. 28. Use cases for IFP: x-ray LR analysis Binding site Compounds Specific interactionsDOI: 10.1021/jm030331x Vladimir Chupakhin, UNISTRA, 2011
  29. 29. Use cases for IFP: pose retrieval (1) RMSD is not 100% correct evaluation function!DOI: 10.1021/ci600342e Vladimir Chupakhin, UNISTRA, 2011
  30. 30. Use cases for IFP: VS Compare the reference x-ray IFP with IFP of docked poses usingCompounds Tanimoto coefficient.databaseVirtualscreeningresults Using standard SF: X% of the real hits Using standard SF + TC: X% + up to 20% Vladimir Chupakhin, UNISTRA, 2011
  31. 31. Use cases for IFP: PPIIFP suitable even for analysis of Protein-ProteinInteractions! Vladimir Chupakhin, UNISTRA, 2011
  32. 32. Use cases for IFP: agonists/antagonists (A) Procaterol – agonist, (B) Carvediol - antagonistSelective Structure-Based Virtual Screening for Full and Partial Agonists of the b2Adrenergic Receptor, DOI: 10.1021/jm800710x Vladimir Chupakhin, UNISTRA, 2011
  33. 33. IFP modificationsIFP modifications
  34. 34. IFP modifications: r-SIFt – R-group IFPLEU83 1101001000 C R1R2 Benefits: CombinatorialIndependent of interaction type! library analysisJust the fact of interaction! (~100.000 compounds)DOI: 10.1021/jm050381x
  35. 35. IFP modifications: w-SIFt – weighed IFP Biological + Activity + Machine learning less moderate most approach: find active activity active correlation between bit frequency and activity Benefits: • help to find what interactions are critical for compound potency • interpretable position dependent scoring function for ligand protein interactionsDOI: 10.1021/ci800466n
  36. 36. Binding site independent IFPBinding site independent IFP
  37. 37. BS-independent IFP: APIFAPIF: A New Interaction Fingerprint Based on Atom Pairs and Its Application to VirtualScreening Distance = range Quadruplet IFP Atom Pair Algorithm 1. Detect interaction patterns (Hydrophobic, HBA, HBD) 2. Define distance1 and distance2 for quadruplet interaction 3. Convert distances to distance range 4. Map distance range and types ….
  38. 38. BS-independent IFP: APIF - Quadruplet Distance 2 Ligand-atom Ligand-atomInteraction Interaction Protein-atom Protein-atom Distance 1 1 bit in the APIF
  39. 39. BS-independent IFP: APIFBenefits:• independent on the binding site• comparable to current scoring functions
  40. 40. BS-independent IFP: Pharm-IFAlgorithm1. Detect interaction patterns (Hydrophobic, Benefits: HBA, HBD) • independent on the2. Define ligand pairs based on ligand atoms interacting with protein ONLY binding site3. Measure their distance • comparable to current4. Map distance to range (quantization) = scoring functions Pharm-IF DOI: 10.1021/ci900382e
  41. 41. IFP-based scoring functionsIFP-based scoring functions
  42. 42. IFP-based SF: AuPosSOMAutomatic clustering of docking poses in virtual screening process using self-organizing map - AuPosSOM• Dock decoys and compounds with known activity• Generate vector of interactions (H-bons, hydroph.interactions)• Train model of the active and incative (vector is input)* f (Input (IFP) = 1 or 0 where 1 – is binder 0 – non binder*Simplified representation 42 Vladimir Chupakhin, UNISTRA, 2011
  43. 43. IFP-based SF: RF-Score A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking – RF- Score DOI:10.1093/bioinformatics/btq112• Vector of 36 features, each feature is occurrence count for j-i atom pair • Mechanism of generations: take all atoms around 12A around selected ligand atom, filter out interaction out of cutoff range, sum the result (for each interaction pair). • PDBBind was used to train Random Forest model • Train model using activity as output and interactions as input 43 Vladimir Chupakhin, UNISTRA, 2011
  44. 44. Literature overview: SVM-SPSupport Vector Regression Scoring of Receptor–Ligand Complexesfor Rank-Ordering and Virtual Screening of Chemical LibrariesDOI: 10.1021/ci200078f • Two types of vectors: SVR-KB (146 features) are knowledge-based pairwise potentials (same as above mentioned but trained with SVR), while SVR-EP is based on physico-chemical properties. SVR-EP vector consist of features extracted from X-score (polar/unpolar SASA, MW, vdW energy, etc) • SVR-KB is better then SVR-EP Vector is unique! Vector is atom pair based 44 Vladimir Chupakhin, UNISTRA, 2011
  45. 45. Merci bien!Thanks a lot!

×