Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Learning Medicinal Chemistry ADMET rules UKQSAR Sept 2017

228 views

Published on

Lecture given by Ed Griffen UKQSAR meeting Sept 2017. Covers material from work in our paper http://pubs.acs.org/doi/10.1021/acs.jmedchem.7b00935 background discussed in https://www.linkedin.com/pulse/first-draft-medicinal-chemistry-admet-encyclopedia-ed-griffen/

Published in: Science
  • Be the first to comment

  • Be the first to like this

Learning Medicinal Chemistry ADMET rules UKQSAR Sept 2017

  1. 1. MedChemica | 2017 MedChemica Learning Medicinal Chemistry ADMET Rules from Cross-Company Matched Molecular Pairs Analysis Al Dossetter, Ed Griffen, Andrew Leach, Shane Montague,
  2. 2. MedChemica | 2017
  3. 3. MedChemica | 2017 ‘Big Data’ analysis for Medicinal Chemistry • No new compounds to make • No new testing to do • Exploit the compounds and data you’ve already paid for • Accelerate all new projects • Augment the skills and experience of your chemists • Mythbusting… All very cost effective
  4. 4. MedChemica | 2017 Make a real textbook of Medicinal Chemistry MMPA MMPA MMPA Combine and Extract Rules Multiple Pharma ADMET data >437000 rules Better Project decisions Increased Medicinal Chemistry learning http://pubs.acs.org/doi/10.1021/acs.jmedchem.7b00935
  5. 5. MedChemica | 2017 Pillars of Knowledge Mining Data Cheminformatics Statistics Engineering Interface Design Better Decisions ?
  6. 6. MedChemica | 2017 Where to get data? • Public data is unrepresentative • Censored by publication bias • Pharma data – can’t share structures due to IP. • Use chemical transformations to encode knowledge from matched molecular pair (MMP) analysis  now sharable Novartis: Kramer, C.; Kalliokoski, T. et al The Experimental Uncertainty of Heterogeneous Public Ki Data J. Med. Chem 2012, 55, 5165 If project data really looked like that, there would be no problem in the Pharma industry.
  7. 7. MedChemica | 2017 Data Sources Roche Database AZ Data MMP finder AZ Database MMP finder MMP finder Roche Data Genentech Data Grand Rule Database Grand Rule Database Grand Rule Database Grand Rule Database AZ Exploitation Roche Exploitation Genentech Exploitation >500 million pairs MedChemica Aggregation Individual company firewall Genentech Database 0.5 million rules
  8. 8. MedChemica | 2017 Pillars of Knowledge Mining Data Cheminformatics Statistics Engineering Interface Design Better Decisions ?
  9. 9. MedChemica | 2017 • Matched Molecular Pairs – Molecules that differ only by a particular, well-defined structural transformation • Transformation with environment capture – MMPs can be recorded as transformations from A B • Environment is essential to understand chemistry Griffen, E. et al. Matched Molecular Pairs as a Medicinal Chemistry Tool. Journal of Medicinal Chemistry. 2011, 54(22), pp.7739-7750. Advanced MMPA with MCPairs Δ Data A- B 1 2 2 3 3 3 4 4 4 12 23 3 34 4 4 A B Environment is key and we need to capture it in our chemical encoding…
  10. 10. MedChemica | 2017 Environment really matters HMe: • Median Dlog(Solubility) • 225 different environments 2.5log 1.5log HMe: • Median Dlog(Clint) Human microsomal clearance • 278 different environments
  11. 11. MedChemica | 2017 HF: What effect on Clearance? • Median Dlog(Clint) Human microsomal clearance • 37 different environments 2 fold improvement 2 fold worse Increase clearanc e decrease clearanc e
  12. 12. MedChemica | 2017 MMPA: Engineering challenges • Quick to implement on a small scale • Always becomes an n2 problem…. • ‘Challenging’ at enterprise scales 100,000+ – Cheminformatics ‘gotchas’ • Tautomers, charge states • Unusual aromatic systems • Highly symmetric molecules • Capturing and coding environments accurately – Structure and data integrity – Assay ontologies – Database schema optimized for cluster I/O • Speed at scale essential
  13. 13. MedChemica | 2017 Identify and group matching SMIRKS Calc ulate statistical parameters for eac h unique SMIRKS(n, median, sd, se, n_up/ n_down) Is n ≥ 6? Not enough data: ignore transformation Is the | median| ≤ 0.05 and the interc entile range (10-90%) ≤ 0.3? Perform two-tailed binomial test on the transformation to determine the signific anc e of the up/ down frequenc y transformation is c lassified as ‘neutral’ Transformation c lassified as ‘NED’ (No Effec t Determined) Transformation c lassified as ‘increase’ or ‘ decrease’ depending on whic h direc tion the property is c hanging passfail yesno yesno Rule selection 0 +ve-ve Median data difference Neutral IncreaseDecrease NED • No assumption of normal distribution • Manages ‘censored’ = qualified / out-of-range data
  14. 14. MedChemica | 2017 Merging knowledge • Use the transforms that are robust in both companies to calibrate assays. • Once the assays are calibrated against each other the transform data can be combined to build support in poorly exemplified transforms • Methodology precedented in other fields CalibrateRobust Robust Weak Weak Discover Novel Pharma 1 Pharma 2
  15. 15. MedChemica | 2017 Merging Assays Compound A Compound B Compound C Compound D Transformation 1  Transformation 2  pIC50, log(Clint), pSol etc Assay 1 Assay 2 DT1 DT2 DT1’= DT1 DT2’= DT2 DT1’ DT2’ Assay 2 more sensitive than Assay 1 Assay 1 D Assay 2 D Assay 2 less sensitive than Assay 1 T1 T2 • Sets of transformations can be calibrated against each other as we are comparing D values in assays not absolute values
  16. 16. MedChemica | 2017 Merging Details • Datasets are standardized by comparison of transformations shared by contributing companies • Transformations are examined at the “pair example” level • Minimum of 6 transformations, each with a minimum of 6 pairs (42 compounds bare minimum) required to standardise • “calibration factors” extracted to standardize the datasets to a common value – mean of calibration factors 0.94, typical range 0.8-1.2. • Datasets with too few common transformations have standard compound measurements shared for calibration.
  17. 17. MedChemica | 2017 Pharma 1 100k rules Pharma 2 92k rules Pharma 3 37k rules 5.8k rules in common (pre-merge) ~ 2% New Rules 88k ~26% of total Merge Combining data yields brand new rules Gains: 300 - 900% Merging knowledge – GRDv1
  18. 18. MedChemica | 2017 Exploiting Knowledge for Compound Optimization Measured Data rule finder Exploitable Knowledge MCExpert System Problem molecule New molecule suggestions rule finder MCPairs= “..it’s like asking 150 of your peers for ideas in just a few seconds” – AZ Principal Scientist
  19. 19. MedChemica | 2017 Build Interfaces to many tools Pair & Rule Database Compounds from Rules API server RESTful API Chemistry Shape and electrostatics MCPairs MCRules Corporate structures and measurements
  20. 20. MedChemica | 2017 Knowledge Extracted Numbers of statistically valid transforms Grouped Datasets Number of Rules logD7.4 153449 Merged solubility 46655 In vitro microsomal clearance: Human, rat, mouse, cyno, dog 88423 In vitro hepatocyte clearance : Human, rat, mouse, cyno, dog 26627 MCDK permeability A-B / B – A efflux 1852 Cytochrome P450 inhibition: 2C9, 2D6 , 3A4 , 2C19 , 1A2 40605 Cardiac ion channels NaV 1.5, hERG ion channel inhibition 15636 Glutathione Stability 116 plasma protein or albumin binding Human, rat, mouse, cyno, dog 64622 Grand Rule Database v3
  21. 21. MedChemica | 2017 Single company vs merged Comparison between Roche-only and GRD rules for human microsomal clearance. Overall R2 is 0.76 and RMSE 0.11.
  22. 22. MedChemica | 2017 There is no “logD receptor”… • We often use lipophilicity as a design surrogate • Provides a context for changes • Key multi-objective design issues are centered round conflicting logD correlations: • Solubility & metabolic stabilitypotency & permeability • Particularly useful to look at chemical transformations that ‘ break the dogma’ of logD correlation
  23. 23. MedChemica | 2017 Solubility : logD – trends & exceptions >=20 examples per rule, n=13,453 R2 = 0.66, slope = -0.57, intercept = 0. Magenta line: line of slope -1, intercept 0, dark blue line linear best fit, pale blue density ellipse contains 99% and the mid blue ellipse contains 50% of the transformations.
  24. 24. Exceptional Solubility transformations Transformation median ΔlogD ±std (nPairs) median ΔlogSol ±std (nPairs) Comment 0.00 ± 0.67 (91) 0.73 ± 0.72 (87) DlogD == Solubility  -0.10 ±0.83 (83) 0.65 ± 0.96 (69) 0.07 ± 0.50 (108) 0.52 ± 0.77 (80) -0.10 ± 0.54 (208) 0.40 ± 0.78 (115) -0.59 ± 0.49 (82) 0.03 ± 0.72 (98) DlogD  Solubility ==
  25. 25. MedChemica | 2017 Clearance : logD – trends & exceptions >=20 examples per rule, n=11,572 R2 = 0.40, slope 0.23, intercept = 0. Magenta line: line of slope 1, intercept 0, dark blue line linear best fit, pale blue density ellipse contains 99% and the mid blue ellipse contains 50% of the transformations.
  26. 26. Exceptional HLM transformations Transformation median ΔlogD ±std (nPairs) HLM median Δlog(Clint) ±std (nPairs) Comment 0.35±0.45 (15) -0.34±0.71 (13) DlogD  Clint  0.70±0.74 (117) -0.32±0.51 (53) 0.73±0.61 (26) -0.23±0.36 (18) 0.00±0.11 (19) -0.59±0.38 (14) DlogD == Clint  -0.69±0.42 (8) 0.76±0.59 (7) DlogD  Clint 
  27. 27. MedChemica | 2017 Pillars of Knowledge Mining Data Cheminformatics Statistics Engineering Interface Design Better Decisions ?
  28. 28. MedChemica | 2017 Influencing Chemists “In the choice between changing ones mind and proving there's no need to do so, most people get busy on the proof.” John Kenneth Galbraith “For the great enemy of truth is very often not the lie-- deliberate, contrived and dishonest--but the myth-- persistent, persuasive, and unrealistic. Too often we hold fast to the clichés of our forebears. We subject all facts to a prefabricated set of interpretations. We enjoy the comfort of opinion without the discomfort of thought.” Address by President John F. Kennedy Yale University Commencement
  29. 29. MedChemica | 2017 Better Human-Machine interactions All software is mediated through people • We want to augment medicinal chemists skills and experience • Chemists need to discover knowledge themselves • Intuitive ( = fast & familiar) • Summary data + option to drill into the detail • What are the two interfaces chemists feel most comfortable with? • Web browsers • Excel
  30. 30. MedChemica | 2017 https://www.youtube.com/playlist?list=PLtkCAojNL97xs1kd5JHngjIRhl4ZPFTlL
  31. 31. MedChemica | 2017 Pillars of Knowledge Mining Data Cheminformatics Statistics Engineering Interface Design Better Decisions ?
  32. 32. MedChemica | 2017 More examples of Success 32 Thompson; M.J. et al J. Med. Chem., 2015, 58 (23), pp 9309–9333 DOI: 10.1021/acs.jmedchem.5b01312
  33. 33. MedChemica | 2017 “Me-Betters” on a Massive scale Enumerator System 1162 Marketed Drugs Wealth of Follow-on opportunities Grand Rule Database v3 Improve solubility & metabolism = lower dose = uid from bid/tid Safer, better compliance ~425 improvement suggestions / drug
  34. 34. MedChemica | 2017 • Exploiting MMPs – – Matched molecular series – MMP based clustering – QSAR from MMPA • Interface design is key To the Future ?
  35. 35. MedChemica | 2017 Conclusions • We have to accelerate projects – Exploiting existing data is highly efficient • High quality medchem knowledge can be mined and exchanged on a large scale – There is a huge amount of medicinal chemistry knowledge – Right science, statistics, engineering • Human - machine interfaces are critical
  36. 36. MedChemica | 2017 Collaborators and Users - experience
  37. 37. MedChemica | 2017 About Us Passionate about generating better decisions from data Dr Andrew G. Leach Technical Director Liverpool John Moores 12 years experience large Pharma Applied computational and medicinal chemistry Dr Ed Griffen Technical Director 21 years experience large Pharma, biotech Medicinal chemistry and large scale statistical analysis methods Dr Al Dossetter Managing Director 17 years Medicinal chemistry large Pharma and extensive cloud computing experience Dr Ali Griffen Business Analyst 21 years experience Team leader bioscientist and biological data curation large Pharma Dr Shane Montague Lead Data Scientist PhD Computer Science 13 years experience Microsoft, University of Salford Data science and information security

×