Successfully reported this slideshow.
Your SlideShare is downloading. ×

RSC Hatfield 2018 Kinase meeting : potency patents MMPA approaches

More Related Content

Similar to RSC Hatfield 2018 Kinase meeting : potency patents MMPA approaches (20)

RSC Hatfield 2018 Kinase meeting : potency patents MMPA approaches

  1. 1. Seed Suggestions % in SureChEMBL 222 43 234 21 protease, 6536 phosphatase, 260 kinase, 12686 ion_channel, 4370 GPCR_7TM, 19523 Δ data A to B MedChemica Potency and Patents, new arenas for Matched Molecular Pair analysis (MMPA) Dr. Al G. Dossetter, Dr. Ed J. Griffen, Dr. Andrew G. Leach, Dr. Shane Montague References 1Griffen, E. et al. Matched Molecular Pairs as a Medicinal Chemistry Tool. J. Med. Chem. 2011, 54(22), pp.7739-7750. 2Leach, A.G. et. al. Matched Molecular Pairs as a Guide in the Optimization of Pharmaceutical Properties; a Study of Aqueous Solubility, Plasma Protein Binding and Oral Exposure. J. Med. Chem. 2006, 49(23), pp.6672-6682. 3Papadatos, G. et al. Lead Optimization Using Matched Molecular Pairs: Inclusion of Contextual Information for Enhanced of hERG Inhibition, Solubility, and Lipophilicity. J. Chem. Inf. Model. 2010, 50(10), pp.1872-1886. Problem Can we understand the relationship between patents, identify critical compounds and automatically extract SAR? Solution Combine all the compounds and perform MMPA to find all the pair relationships independent of patent membership. Use graph theory to identify critical compounds and exploit public data to suggest further analogues and estimate their potency. MMPA - a method of determining structure activity relationships (SAR’s) within sets of compounds. Matched Molecular Pairs (MMP’s) are identified and differences in their measured data are used to link properties to structure.1 Selecting rules Statistical analysis of data sets of SMIRKS to extract chemical transformations that are most likely to be genuine. 3) 4) Extract rules from public potency data Learning • Useful potency SAR knowledge can be extracted from public data • MMP network analysis of patents identifies pivotal compounds • The method is validated by finding that large numbers of compounds suggested using these rules are now patented • Extending MMP based network analysis by application of machine learning methods and exploiting MCSS structures within clusters to improve predictive accuracy Advanced MMP’s • Two pair finding techniques are available • Not all pairs are found by a single method, both methods are needed to maximize the MMP output Molecules that differ only by a particular, well- defined, structural transformation2 A MMP found by both methods: 1) Fragment and Index method Maximum Common Sub-Structure method (MCSS) Environment Capture • Chemical transformations are encoded as SMIRKS and recorded along with their delta property value(s) • The SMIRKS contain the structural change along with the chemical environment spanning up to 4 atoms out Essential for understanding the context of the transformation3 [c:6]1[c:4]([H])[c:2]([H])[c:1]([c:3]([H])[c:5]1[c: 7])([H])>>[c:6]1[c:4]([H])[c:2]([H])[c:1]([c:3]([H] )[c:5]1[c:7])[F] 2) [c:4][c:2]([H])[c:1]([c:3]([H])[c:5])([H]) >>[c:4][c:2]([H])[c:1]([c:3]([H])[c:5])[F] [c:2][c:1]([c:3])([H])>>[c:2][c:1]([c:3])[F] [c:1]([H])>>[c:1][F] The MMP as a transformation: 4 atom environment: 3 atom environment: 2 atom environment: 1 atom environment: Δ data A to BΔ data A to B Δ data A to B FragA >> FragB Kinase class number of rules kinase_agc 1576 kinase_atypical 788 kinase_camk 2376 kinase_ck1 32 kinase_cmgc 1010 kinase_reg 256 kinase_ste 110 kinase_tk 4696 kinase_tkl 1842 • Clean: • ChEMBL structures, • convert measurements to pIC50 / pKi, • aggregate multiple measurements on same compound by target • Find MMPA based rules per target • Organize targets by protein class and sub-class • Rules can by applied by target, sub-class or class • The distribution of rules mirrors the distribution of data 5) Identify pivotal compound in patents • Clean SureChEMBL structures with patent identifiers • Generate a network map showing MMP relationships between patents • Network analysis identifies the key compounds within patents • Points are compounds colored by the patent they were first disclosed in (green / blue), or the clinically used compounds(red) or yellow – most highly connected compound in each patent • Links represent a matched molecular pair relationships • Distances are based on a spring force model and are for visualization only O ON O N N HN Cl F O O O O N N HN N O O N N HN Cl F O O N N HN 2 steps to Gefitinib 3 steps to Erlotinib Gefitinib Erlotinib Focus the rules used to generate new compounds by applying those from the right kinase sub class Apply rules to pivotal compounds O O N N HN N O O N N HN Cl F 6) Estimate potency from network models • Extending the network analysis to all the public EGF potency data: • MMP based clusters can be identified and characterized by their potency • Being a MMP neighbor in a cluster is sufficient to estimate a compounds potency to within 1 log. • The MMP methods used generate sets of maximum common substructures for each cluster enabling further direction of chemistry • Points represent individual compounds • Links represent a matched molecular pair relationship pIC5 0 >8 6-8 <6 EGFR tyrosine kinase network based potency analysis Size of cluster Clusters Compounds <8 compounds 133 415 >=8 compounds 59 3213 Total 192 3628 Simple regression modeling of potency based on just cluster membership(10 fold cross validation): R2 0.44, RMSE 0.97 Further modeling based on the maximum common substructures within clusters in progress.