1. Seed Suggestions % in
SureChEMBL
222 43
234 21
protease, 6536
phosphatase,
260
kinase, 12686
ion_channel,
4370
GPCR_7TM,
19523
Δ data A to B
MedChemica
Potency and Patents, new arenas for Matched Molecular
Pair analysis (MMPA)
Dr. Al G. Dossetter, Dr. Ed J. Griffen, Dr. Andrew G. Leach, Dr. Shane Montague
References
1Griffen, E. et al. Matched Molecular Pairs as a Medicinal Chemistry Tool. J. Med. Chem. 2011, 54(22), pp.7739-7750.
2Leach, A.G. et. al. Matched Molecular Pairs as a Guide in the Optimization of Pharmaceutical Properties; a Study of Aqueous Solubility, Plasma Protein Binding and Oral Exposure. J. Med. Chem. 2006, 49(23), pp.6672-6682.
3Papadatos, G. et al. Lead Optimization Using Matched Molecular Pairs: Inclusion of Contextual Information for Enhanced of hERG Inhibition, Solubility, and Lipophilicity. J. Chem. Inf. Model. 2010, 50(10), pp.1872-1886.
Problem
Can we understand the relationship between
patents, identify critical compounds and
automatically extract SAR?
Solution
Combine all the compounds and perform MMPA to find all the pair relationships
independent of patent membership. Use graph theory to identify critical compounds
and exploit public data to suggest further analogues and estimate their potency.
MMPA - a method of determining structure activity relationships (SAR’s) within sets of compounds. Matched Molecular Pairs
(MMP’s) are identified and differences in their measured data are used to link properties to structure.1
contact@medchemica.com
Selecting rules
Statistical analysis of data sets of SMIRKS to extract chemical
transformations that are most likely to be genuine.
3)
4) Extract rules from public potency data
Learning
• Useful potency SAR knowledge can be extracted from public data
• MMP network analysis of patents identifies pivotal compounds
• The method is validated by finding that large numbers of compounds suggested using these rules are now patented
• Extending MMP based network analysis by application of machine learning methods and exploiting MCSS structures within clusters to improve predictive accuracy
Advanced MMP’s
• Two pair finding techniques are available
• Not all pairs are found by a single method, both methods are
needed to maximize the MMP output
Molecules that differ only by a particular, well-
defined, structural transformation2
A MMP found by both methods:
1)
Fragment and Index method
Maximum Common Sub-Structure method (MCSS)
Environment Capture
• Chemical transformations are encoded as SMIRKS and recorded
along with their delta property value(s)
• The SMIRKS contain the structural change along with the chemical
environment spanning up to 4 atoms out
Essential for understanding the context of the transformation3
[c:6]1[c:4]([H])[c:2]([H])[c:1]([c:3]([H])[c:5]1[c:
7])([H])>>[c:6]1[c:4]([H])[c:2]([H])[c:1]([c:3]([H]
)[c:5]1[c:7])[F]
2)
[c:4][c:2]([H])[c:1]([c:3]([H])[c:5])([H])
>>[c:4][c:2]([H])[c:1]([c:3]([H])[c:5])[F]
[c:2][c:1]([c:3])([H])>>[c:2][c:1]([c:3])[F] [c:1]([H])>>[c:1][F]
The MMP as a transformation:
4 atom environment: 3 atom environment:
2 atom environment: 1 atom environment:
Δ data A to BΔ data A to B
Δ data A to B
FragA >> FragB
Kinase class number of rules
kinase_agc 1576
kinase_atypical 788
kinase_camk 2376
kinase_ck1 32
kinase_cmgc 1010
kinase_reg 256
kinase_ste 110
kinase_tk 4696
kinase_tkl 1842
• Clean:
• ChEMBL structures,
• convert measurements to pIC50 / pKi,
• aggregate multiple measurements on same compound by
target
• Find MMPA based rules per target
• Organize targets by protein class and sub-class
• Rules can by applied by target, sub-class or class
• The distribution of rules mirrors the distribution of data
5) Identify pivotal compound in patents
• Clean SureChEMBL structures with patent identifiers
• Generate a network map showing MMP relationships
between patents
• Network analysis identifies the key compounds within
patents
• Points are compounds colored by the patent they
were first disclosed in (green / blue), or the clinically
used compounds(red) or yellow – most highly
connected compound in each patent
• Links represent a matched molecular pair
relationships
• Distances are based on a spring force model and
are for visualization only
O
ON
O
N
N
HN Cl
F
O
O
O
O
N
N
HN
N O
O N
N
HN
Cl
F
O
O N
N
HN
2 steps to Gefitinib
3 steps to Erlotinib
Gefitinib
Erlotinib
Focus the rules used to
generate new
compounds by applying
those from the right
kinase sub class
Apply rules
to pivotal
compounds
O
O N
N
HN
N O
O N
N
HN
Cl
F
6) Estimate potency from network models
• Extending the network analysis to all the public EGF
potency data:
• MMP based clusters can be identified and
characterized by their potency
• Being a MMP neighbor in a cluster is sufficient to
estimate a compounds potency to within 1 log.
• The MMP methods used generate sets of maximum
common substructures for each cluster enabling
further direction of chemistry
• Points represent individual compounds
• Links represent a matched molecular pair relationship
pIC5
0
>8
6-8
<6
EGFR tyrosine kinase network based
potency analysis
Size of cluster Clusters Compounds
<8 compounds 133 415
>=8 compounds 59 3213
Total 192 3628
Simple regression modeling of potency based on just
cluster membership(10 fold cross validation): R2 0.44,
RMSE 0.97
Further modeling based on the maximum common
substructures within clusters in progress.