Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

2020.04.07 automated molecular design and the bradshaw platform webinar

563 views

Published on

This presentation described how data-driven chemoinformatics methods may automate much of what has historically been done by a medicinal chemist. It explored what is reasonable to expect “AI” approaches might achieve, and what is best left with a human expert. The implications of automation for the human-machine interface were explored and illustrated with examples from Bradshaw, GSK’s experimental automated design environment.

Published in: Healthcare
  • Be the first to comment

  • Be the first to like this

2020.04.07 automated molecular design and the bradshaw platform webinar

  1. 1. 7 April, 2020 Automated Molecular Design and the BRADSHAW platform Speaker: Dr. Darren Green, Director of Molecular Design & Senior Fellow at GlaxoSmithKline Moderator: Vladimir Makarov
  2. 2. This webinar is being recorded
  3. 3. ©PistoiaAlliance Introduction to Today’s Speaker Dr. Darren Green, Director of Molecular Design & Senior Fellow at GlaxoSmithKline
  4. 4. Automated Molecular Design & the BRADSHAW platform Darren Green Data & Computational Sciences
  5. 5. The Discovery Cycle The opportunity
  6. 6. The nature of lead optimisation project data Presentation title 6 Sparse & Imbalanced TargetA TargetB Solubility hERG 3A4 PPB intCl %F mol1 mol2 mol3 mol4 mol5 mol6 mol7 mol8 mol9 mol10 mol11 mol12 mol13 mol14 mol15 mol16 mol17 mol18 mol19 mol20 mol21 mol22 mol23 mol24 mol25 mol26 mol27 mol28 mol29 mol30 mol31 mol32 mol33 mol34 mol35 mol36 mol37 mol38 mol39 mol40 mol41 mol42 mol43 mol44 mol45 mol46 mol47 mol48 mol49 mol50 Discontinuous Competing & Multi-Objective Small Start with 1, maybe 10 data points Slow & Expensive to grow Large Search space
  7. 7. Project Data Ideas Synthesis Desired CP Knowledge Memory Expert tools Available reagents Current Model In silico tools Medicinal Chemist(s)
  8. 8. • Application relies on • “intuition” • Patchy utilisation • Non-experts in the tools and algorithms • Evaluating rather than generating ideas Maximising the impact of computational methods “He thinks his judgments are complex and subtle but a simple combination of scores could probably do better” “If the environment is sufficiently regular and if the judge has had a chance to learn its regularities, the associative machinery will recognize situations and generate quick and accurate predictions and decisions. You can trust someone’s intuitions if these conditions are met.” “Whenever we can replace human judgment by a formula, we should at least consider it”
  9. 9. • Data-driven generation rather than evaluating ideas • Systematic application • Integration of Disruptive methods – Molecule Generators – Route prediction – Generative models aka “inverse QSAR” • Evidence that the methods work • From 1,000s of molecules to 100s • From many iterations to a few Maximising the impact of computational methods What is required?
  10. 10. Project Data Ideas Synthesis Desired CP Knowledge Memory Expert tools Available reagents What if? In silico tools Medicinal Chemist(s) We put systematic ideation and modelling at the centre of the process? Models
  11. 11. Solving the conundrum An optimal solution may require elements of all of these Improved Design “AI” DNN RNN GAN Latent spaces AL RL “The Human” Creativity Mechanistic thinking Molecule Quality & Best Practice Change Management Training Access Scalability “Physics” FEP Simulations “CIX” MMPs Exp Design QSAR
  12. 12. Automated Cheminformatics @ GSK • QSAR modelling • HTS analysis/progression • Substructure searching • Similarity searching • Compound acquisition • De novo design
  13. 13. Some Learnings 13 • It is possible to automate a lot of tasks that are involved in Medicinal Chemistry design/analysis • This was possible before “AI”, Deep Learning and GPUs • Automated methods require human supervision • It is difficult to supplant manual/familiar ways of working
  14. 14. Some Learnings 14 • It is possible to automate a lot of tasks that are involved in Medicinal Chemistry design/analysis • This was possible before “AI”, Deep Learning and GPUs • Automated methods require human supervision • It is difficult to supplant manual/familiar ways of working
  15. 15. • Data-driven generation rather than evaluating ideas • Systematic application • Integration of Disruptive methods – Molecule Generators – Route prediction – Generative models aka “inverse QSAR” • Evidence that the methods work • From 1,000s of molecules to 100s • From many iterations to a few Maximising the impact of computational methods What can we do differently? – Being a solution to a problem
  16. 16. Building BRADSHAW: GSK’s automated molecular design platform Biological Response Analysis and Design System using an Heterogenous, Automated Workflow Design Cycle Management Molecule Generation Predict and Score Select Data Compound Profile Reactivity Filters Physchem (PFI, fsp3, solubility) Desirability (drug like, lead like) Off target (vEXP etc) Safety (DEREK, eHomo) DMPK Project Specific QSAR Project Specific 3D/Free Energy Synthetic tractability/developability ML & physics based models High Throughput Low Make Test
  17. 17. Inspiration and thanks Ed Maliski and John Bradshaw Insert your date / confidentiality text herePresentation title 17 “A computer language, ALEMBIC, is used to collate the ideas of the scientists. The resulting list of potential molecules is then parametrised using whole molecule descriptors. Based on these descriptors, appropriate statistical techniques are used to generate sets of molecules retaining the maximum amount of the information inherent in all possible combinations of the scientists ideas”Maliski EG, Latour K, Bradshaw J (1992) The whole molecule design approach to drug discovery. Drug Des Discov 9:1–9
  18. 18. The Challenge • Build a system which combines methods from different disciplines • ML, Cheminformatics, Chemometrics, Optimisation • Robust enough for use by multiple people across a portfolio of projects • Scales to very large numbers of compounds • Delivers small numbers of compounds that can be ingested by a human • Simple to add/modify/remove methods without redeveloping the interface • Add/modify/remove methods without the need to retrain users Insert your date / confidentiality text herePresentation title 18
  19. 19. The Opportunity – Build in best practice – Safety alerts, physicochemical properties, institutional memory, multi parameter optimisation, synthetic tractability – Automate the expert – Reduce time/money spent on end user software – Address cycle times through different ways of working Insert your date / confidentiality text herePresentation title 19
  20. 20. 20 BRADSHAW high level architecture BRADSHAW Client Webservices Algorithms, models GSK Infrastructure HPC DB
  21. 21. System architecture Insert your date / confidentiality text herePresentation title 21
  22. 22. Tasks • BRADSHAW orchestrates the running of workflows (“Tasks”) on compound sets and chaining the inputs/outputs of these Tasks to form designs. – A Task is the term used to identify a particular scientific process – Molecule Generator, Molecule Filter, Active Learning – Tasks are described by an interface to a web service with a set of parameters, with the expected columns in input and output files also defined. – An administrator can easily create new Tasks without redeployment of the system. Insert your date / confidentiality text herePresentation title 22
  23. 23. • de novo design • Given a set of constraints, generate molecular structures which satisfy those constraints • Classic problems with de novo design algorithms • Nonsense structures • Structures with intrinsic liabilities • Structures that cannot be made • BRADSHAW takes a dual approach • Cheminformatics methods to generate plausible structures based on what has been done before • Deep Learning algorithms trained on relevant GSK chemistry space including novel methods GSK Molecule Generators Deep Learning RNN JTVAE *RG2SMI Knowledge based *BioDig *Fit&Predict MATSY Reaction based *BRICS * GSK specific methods and/or implementations Degen, et al. ChemMedChem 3, 1503-1507 (2008). Hussain & Rea JCIM 50, 339-348 (2010). Free & Wilson. J. Med. Chem. 7, 395-399 (1964). Pogany, Pickett et al. JCIM 59, 1136-1146 (2019).
  24. 24. GSK BRICS : Building on what chemists have made GSK algorithm to do fragment replacement Replace fragments with equivalent attachments from GSK chemistry space RECAP: Lewell et al. J Chem Inf Comput Sci. 38, 511-22 (1998) BRICS (rdkit): Degen et al. ChemMedChem 3:1503–7 (2008)
  25. 25. BioDig – Automated SAR extraction Matched Molecular Pairs: Hussain, Rea, J. Chem. Inf. Model. 50, 339-348 (2010) 25 pClearance = -0.401 pClearance = 0.192 Transform rule ΔpClear = 0.593 Property Number of compounds Number of MMPs Clearance (Invitro) 48K 9.1 Million Clearance (Rat) 62K 15.1 Million Clearance (Mouse) 17K 2.2 Million ChromlogD 435K 707 Million Cytotox 105K 63 Million hERG 21K 4.2 Million P450 2C19 247K 251 Million P450 2D6 249K 259 Million P450 3A4 268K 288 Million Permeability 155K 137 Million PGP efflux 8K 0.6 Million PP Binding 349K 386 Million Solubility 374K 591 Million HTS collection 2.3M 13.4 Billion No context: Level 0 Neighbour context: Level 2 Wide distribution Positive distribution
  26. 26. – Problem – RNNs, GANs and Autoencoders rely on – large numbers of known compounds, – imperfect models (transfer or reinforcement learning) – post-hoc filtering to target particular regions of chemical space. Reduced Graph to SMILES Deep Learning for Molecule Generation: From one hit? Oc1cc(N)c(Cl)cc1C(=O)NC1CCNCC1 X [Cr][Cu][Y] – Hypothesis – The Reduced Graph represents chemical space at a higher level that could avoid this complication but is a one way encoding – Solution – Use latest deep learning algorithms from language translation to translate Reduced Graph to SMILES. – Implements several novel features: bi-directional LSTMs and attention mechanism
  27. 27. Multiple Molecules output All with same RG Reduced Graph to SMILES Deep Learning for Molecule Generation: From one hit? 27 RG input [Cr][Cu][Y] Pogány, Arad, Genway, Pickett, De Novo Molecule Design by Translating from Reduced Graphs to SMILES. Journal of Chemical Information and Modeling 2019 59 (3), 1136-1146
  28. 28. • We can generate molecules • But do we generate the “right” ones? • And what is the context for “right”? Validating the methods Systematic Ideation Deep Learning RNN RG2SMI Knowledge based BioDig Fit&Predict Reaction based BRICS
  29. 29. Our experiment A “hit 2 lead” use case – Single hit from a screen – Ask a number of medicinal chemists to describe the 20 molecules they would make – Compare these to what our molecule generators produce Insert your date / confidentiality text here4x3 core presentation 29
  30. 30. Hit 1 Hit 2 Hit 3 Hit 4 5 5 7 4 7 7 5 6 4 5 2 7 2 4 5 4 5 7 7 2 5 7 5 7 2 7 7 5 4 6 7 3 8 7 5 8 8 8 8 2 7 4 6 6 6 7 8 8 5 4 4 4 3 4 3 6 3 2 2 4 2 2 2 5 5 6 2 6 5 3 1 1 2 3 2 5 4 5 2 3 3 4 2 0 8 3 4 2 3 4 5 3 3 2 2 5 5 4 7 2 5 6 2 7 4 4 3 5 4 2 5 3 5 3 2 2 3 7 5 3 7 3 7 4 3 4 7 5 5 8 6 7 8 3 6 3 4 4 7 7 8 7 7 11 7 4 7 2 6 6 2 6 4 2 5 6 5 2 5 3 3 5 5 1 2 7 7 5 7 2 7 4 2 4 7 4 5 6 4 5 8 4 6 2 5 3 4 4 5 8 7 9 4 4 2 2 6 6 4 6 4 3 3 5 4 5 3 3 5 5 2 3 4 3 2 6 5 5 4 4 6 4 7 2 3 5 2 2 2 2 3 2 5 5 5 4 4 5 1 5 7 3 5 6 6 9 6 6 3 4 2 6 5 5 5 3 5 4 3 8 8 8 4 6 4 4 3 2 6 4 2 0 7 5 1 4 7 3 4 5 5 5 5 4 7 6 7 6 7 4 6 6 6 9 8 9 3 4 3 7 7 6 3 5 5 4 5 4 7 8 3 5 6 5 4 4 4 4 4 4 7 8 3 5 4 4 4 3 5 5 6 4 4 2 2 5 5 3 2 2 4 6 8 4 2 4 6 8 7 2 2 5 2 3 5 2 3 6 4 4 7 2 7 7 7 2 4 4 4 7 3 8 6 5 7 4 5 5 6 3 5 5 5 7 6 6 7 6 6 6 9 4 4 5 3 6 2 6 2 1 3 8 4 2 7 6 3 5 3 3 5 4 2 0 4 4 4 4 6 4 6 3 7 4 4 4 4 3 6 6 5 6 4 3 4 4 4 3 3 2 4 5 2 4 5 5 3 2 4 8 7 3 7 5 4 3 8 3 5 5 3 7 7 4 4 8 3 5 6 3 8 5 4 7 4 5 6 6 5 9 5 2 4 2 3 4 4 4 5 5 3 4 7 3 7 7 3 8 8 6 6 5 9 7 7 4 8 6 5 5 8 7 4 8 7 9 7 2 7 8 5 6 2 5 6 6 7 9 2 2 3 5 3 2 3 6 3 2 3 4 5 1 5 7 2 4 3 4 3 4 9 7 5 2 6 4 2 1 3 3 6 3 7 6 8 2 7 7 1 7 2 5 5 5 7 7 7 4 5 6 5 6 7 8 6 4 4 4 7 1 3 5 6 6 5 4 5 3 7 7 8 0 7 5 2 4 5 8 4 5 9 6 8 5 11 9 5 6 5 7 6 9 9 7 Overlapping ideas per chemist, per hit Green High Red Low Chemist-chemist correlation how much variance is there in our panel?
  31. 31. How do our Molecule Generators perform? Insert your date / confidentiality text here4x3 core presentation 31
  32. 32. Publications matching “de novo molecular generator” Google scholar Insert your date / confidentiality text herePresentation title 32 0 500 1000 1500 2000 2500 3000 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
  33. 33. Configuring a new Task
  34. 34. Starting a new Design Insert your date / confidentiality text herePresentation title 34
  35. 35. Adding allowed Tasks Insert your date / confidentiality text herePresentation title 35
  36. 36. Configuring a Task Insert your date / confidentiality text herePresentation title 36
  37. 37. Configuring a Task 3707/04/2020
  38. 38. A whole Workflow Insert your date / confidentiality text herePresentation title 38
  39. 39. Integrating the disruptive with the pragmatic 39 • At some stage humans needs to make a decision to make & test molecules • Annotation of ideas, their provenance and quality is important • Framing ideas for easy digestion: - Clustering - SMARTS matching e.g. “LHS”, “RHS”, “Core”
  40. 40. – BRADSHAW is a fully automated “predict first” design system – High level scientific workflows implement best practice – Customisation is possible via XML configuration – Adding new Tasks is simple and requires no software development Summary 40
  41. 41. Stefan Senger Stephen Pickett Chris Luscombe Sandeep Pal Ian Wall Jamel Meslamani Jennifer Elward Peter Pogany David Marcus Baptiste Canault Richard Lonsdale Jacob Bush Silvia Amabilino Eric Manas David Brett, Adam Powell, Jonathan Masson (Tessella Ltd) Acknowledgements 41
  42. 42. Poll Question 1: What are the areas of Research where the utilization of AI seems the most promising? Choose one or more A. Disease biology understanding B. Identification of new targets C. Identification of new biomarkers D. Patient stratification E. Predictive toxicology
  43. 43. Poll Question 2: What factors limit the use of AI for research in your organization the most? Choose one or more A. Interpretability of results B. Data availability C. Reproducibility of results D. Regulatory restrictions
  44. 44. ©PistoiaAlliance Audience Q&A Please use the Question function in GoToWebinar
  45. 45. ©PistoiaAlliance Upcoming Webinars 1. May/June 2020 (exact date and title TBD) Dr. Djork-Arné Clevert, Head of Machine Learning Research, Bayer AG 2. May/June 2020 (exact date TBD) Radiomics Biomarkers Panel: Laure Fournier, MD, PhD, Hospital Georges Pompidou Thierry Colin, PhD, Sophia Genetics Karine SEYMOUR, MASc, eMBA, President, Medexprim Please suggest other topics and speakers
  46. 46. info@pistoiaalliance.org @pistoiaalliance www.pistoiaalliance.org Thank You

×