Attempt to show how drugs interaction can be used to classify proteins family through they cavities. Resistance problem can be clearly seen together with new phylogenetic trees
2. Aim
A. Zaliani 9thICCS 2
• Bioinformaticians were able to segregate protein targets
by several means from 1D to 3D and 4D
• We have potent means to perform same analysis from
ligand standpoint:
o Fingerprint (e.g. 2D,3D, interactionFP, etc)
o Shape Descriptors
o Grid
• Do we appreciate their peculiarities?
• Would our structural knowledge grow, if we knew some
frequent target-directing structural pattern?
3. Start – Method
A. Zaliani 9thICCS 3
• Plenty of late work trying to link protein structures, functions
and cavities to ligands (and vice versa) through similarity
concepts
• I would here stress not new methods but what we have
already in our hands to boost ideas with couple of applications
with freely available software (like KNIME, R)
• FP = Do we appreciate their peculiarities enough?
• Can we look into statistical models? If yes, do we?
4. Different FingerPrint (FP) for different scopes
A. Zaliani 9thICCS 4
• Can FP explain us this? FP Type Tan-Distance
MW,LogP,HA(CDK)… 0.000
Layered(RDKit) 0.082
AtomPairs (RDKit) 0.098
Indigo(GGA) 0.190
Morgan(RDKit) 0.302
FeatMorgan(RDKit) 0.348
ErG* 0.375
Similarity ≈ 0.62-65
*N. Stiefl et al. JCIM.,(2006), 46(1)208; N. Stiefl et al. JCIM, (2006), 46(2)587
5. A. Zaliani 9thICCS 5
ErG = pharmacophore-fingerprint
Development of ErG (Extended reduced Graph), a 2D-pharmacophoric
similarity tool for virtual screening
ErG is much less substructure-dependent so that:
•Opens opportunities in library design (scaffold-hopping)
•Multiple-to-one correspondence of chemical substructures to
pharmacophoric patterns ‘abstract’
•Similarity searching & ‘scaffold-hopping’ documented
•FP interpretable as each bit corresponds to the count of pharmacophore pair
distances in graph
•Atom types [6] generate pairs [21] x max_distance [15] = 315 bits
Graph
N
N
Ac
D
+
Ac
D
+
Hf
Ac
D
+
Hf
Ar Ar
Charge / H-Bonding
Hydrophobic endcaps
Abstract ring forms
Ac
D
+
Hf
Ar Ar
N
N
Ac
D
+
Ac
D
+
Hf
Ac
D
+
Hf
Ar Ar
Charge / H-Bonding
Hydrophobic endcaps
Abstract ring forms
Ac
D
+
Hf
Ar Ar
RDF vectorization
AcAcd1,AcAcd2,…,AcDod4,…,ArHfd4,…..,+-d15
Cpd_A,0, 0, …,1, …,1, …,0
6. Experiment plan - Dataset
A. Zaliani 9thICCS 6
• From a literature database select a relevant random
subset (ca.17K) literature compounds showing at least
one activity (pEx50>6) towards a precise target among
class families like GPCR-A, Kinases, Proteases or NHR
• Data are high quality in terms of consistency
• Less than 5% of entire Pharma Database of Evolvus
• To check homogeneity all vs. all similarity evaluation with
TanDistance under different FP…..
7. A. Zaliani 9thICCS 7
Liceptor Database
Targets Annotated
• GPCR’s
• Ion- Channels
• CNS Transporters
• Kinases
• Proteases
• Phosphatases
Client Proprietary Targets
Small Molecule Ligand Database Features
Liceptor database can be customized with client specified additional fields and
custom data annotation
• 3.2 Million Structures
• > 1000 Targets
• Global Patents
• Med Chem. Journals
• Data annotated from 1967
• Multiple Target Data
• 2D Structures
• Molecular Descriptors
• IC50 and Unified Values
• Therapeutic Indications
10. Experiment plan – Classification Model
A. Zaliani 9thICCS 10
• Partition Tree model generated
• Platform (KIN, GPCRA, NHR, PROT) can be
predicted with 15 ErG distances only
• If shuffled on Y, models generated with ave errors
ranging 63-77% (100x)
• External predictions at 82,6%
12. Learn from missclassified
A. Zaliani 9thICCS 12
• 15 Distances enough to segregate 17K compounds
in four classes
• From model some insights can be extracted:
• Example KIN relevant features:
i. Presence of Ar-NH(OH) [DoArd1>0]
ii. Absence of a-aminoacid signature AcDod3
=0
iii. Need of AcArd3 >0 if i. applies or =1
6H-Benzo[c]chromen-6-one derivatives as selective ERβ agonists
Bioorganic & Medicinal Chemistry Letters 16, (6), 2006, Pages 1468-1472
13. Learn from missclassified
A. Zaliani 9thICCS 13
• 15 Distances enough to segregate 17K compounds
in four classes
• From model some insights can be extracted:
• Example KIN relevant features:
i. Presence of Ar-NH(OH) [DoArd1>0]
ii. Absence of a-aminoacid signature AcDod3
=0
iii. Need of AcArd3 >0 if i. applies or =1
14. Classification Model – What to learn
A. Zaliani 9thICCS 14
• 15 Distances enough to segregate 17K compounds
in four classes
• From model some insights can be extracted:
• PROTEASE Target relevant features:
i. Presence of AA signature AcDoD3
ii. Presence of AcArd3
iii. Absence/Presence of max 1 HfArd4
15. Classification Model – How do we use this
A. Zaliani 9thICCS 15
• We can try to use these as smarts query into PDB
http://www.pdb.org/pdb/search/advSearch.do
• PROTEASE Target relevant features:
i. Presence of AA signature AcDoD3
ii. Presence of AcArd3
iii. Presence of max 1 HfArd4
• Results of query after removal of non polypeptide,
solvents, chain duplicates
• 101 complexes of which 53% correct proteases
• If only i.&iii. Were used, then 1141 hits found with 738
protease complexes (65%) retrieved
16. Single Family Classification Models
• Each Target Family could also be modeled through
classification
• KNIME offers several functions for:
o Data preparation
o Training/Test split with stratification on population
o Data reduction performed with an exhaustive retrograde selection
o Cross-validation with 100X Leave-10%-out
o Shuffled-Y 100 classification models built for negative test
o Performance statistics given on 25% external test set
A. Zaliani 9thICCS 16
26. Lessons learned here
• QC-based database essential
• 2D Pharmacophoric FP approach is enough but has to be
“understood”
• Making FP less cryptic help understanding potentialities and
limits
• Targets do segregate. Ligands help us realizing this, the more
the more precise
• Pharmacophoric Graph Space is immensely less problematic
than chemical space
• Provocation: how big is graph space of IP?
A. Zaliani 9thICCS 26
27. Limitations
• Question: you find what you already know?
• Question: Do abstraction help us?
• Every FP method is ok, provided that teaches us
something
• Promiscuity reduction is not the only final aim (controlled
promiscuity might be a need)
• Graph distances might be too general
• 2D Pharmacophoric fingerprinting to be improved
A. Zaliani 9thICCS 27
28. Future work
• 3D distances (3Dtriangles) could easily implemented
• Combinations of ligand FP and cavity FP could be really a
breakthrough to have a grip on multi-pharmacology
• FP Weighting for atomic de-solvation contribution is, for
me, KEY
• Agonist/antagonist split
• pEX50 >6 will provide different pictures?
A. Zaliani 9thICCS 28
29. Acknowledgements
A. Zaliani 9thICCS 29
Prof. M. Berthold
Greg Landrum
Nik Stiefl
Aniket Ausekar, CEO
Vikram Palshikar
Rashmi Jain
Mike Bodkin
31. A. Zaliani 9thICCS 31
Approach to Polypharmacology
• Pharmacophore target family mapping using Neural Networks (Kohonen)
• Cpds mapped together with annotated actives from different sources (MDDR, UBI, etc.)
• Clustering method to suggest pharmacophore similarity (Ext.Reduced Graphs fingerprint)
SOM Binary ErG on 9444 cpds with pIC50>8
pIC50_8_SOM8_8_1M_Z (x value)
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
7
Protease
GCPRa
Kinases
NHR
Transporter
Neuron 7,3
775cpds from
different families
N
N
S
O
O
N N 2425712
pIC50(PR)=8.79
N
Cl
N
N O
O
450207
pIC50(NPY_V)=8.79