This presentation described how data-driven chemoinformatics methods may automate much of what has historically been done by a medicinal chemist. It explored what is reasonable to expect “AI” approaches might achieve, and what is best left with a human expert. The implications of automation for the human-machine interface were explored and illustrated with examples from Bradshaw, GSK’s experimental automated design environment.
Call Girls in Adil Nagar 7001305949 Free Delivery at Your Door Model
2020.04.07 automated molecular design and the bradshaw platform webinar
1. 7 April, 2020
Automated Molecular Design and
the BRADSHAW platform
Speaker: Dr. Darren Green, Director of Molecular Design & Senior Fellow at
GlaxoSmithKline
Moderator: Vladimir Makarov
8. • Application relies on
• “intuition”
• Patchy utilisation
• Non-experts in the tools and algorithms
• Evaluating rather than generating ideas
Maximising the impact of computational methods
“He thinks his judgments are complex and subtle but a
simple combination of scores could probably do better”
“If the environment is sufficiently regular and if the judge
has had a chance to learn its regularities, the associative
machinery will recognize situations and generate quick
and accurate predictions and decisions. You can trust
someone’s intuitions if these conditions are met.”
“Whenever we can replace human judgment by a formula,
we should at least consider it”
9. • Data-driven generation rather than evaluating ideas
• Systematic application
• Integration of Disruptive methods
– Molecule Generators
– Route prediction
– Generative models aka “inverse QSAR”
• Evidence that the methods work
• From 1,000s of molecules to 100s
• From many iterations to a few
Maximising the impact of computational methods
What is required?
11. Solving the conundrum
An optimal solution may require elements of all of these
Improved
Design
“AI”
DNN RNN GAN
Latent spaces
AL RL
“The Human”
Creativity
Mechanistic thinking
Molecule
Quality & Best
Practice
Change
Management
Training
Access
Scalability
“Physics”
FEP
Simulations
“CIX”
MMPs
Exp Design
QSAR
13. Some Learnings
13
• It is possible to automate a lot of tasks that are involved in Medicinal Chemistry design/analysis
• This was possible before “AI”, Deep Learning and GPUs
• Automated methods require human supervision
• It is difficult to supplant manual/familiar ways of working
14. Some Learnings
14
• It is possible to automate a lot of tasks that are involved in Medicinal Chemistry design/analysis
• This was possible before “AI”, Deep Learning and GPUs
• Automated methods require human supervision
• It is difficult to supplant manual/familiar ways of working
15. • Data-driven generation rather than evaluating ideas
• Systematic application
• Integration of Disruptive methods
– Molecule Generators
– Route prediction
– Generative models aka “inverse QSAR”
• Evidence that the methods work
• From 1,000s of molecules to 100s
• From many iterations to a few
Maximising the impact of computational methods
What can we do differently?
– Being a solution
to a problem
16. Building BRADSHAW: GSK’s automated molecular design platform
Biological Response Analysis and Design System using an Heterogenous, Automated Workflow
Design Cycle
Management
Molecule
Generation
Predict and
Score
Select
Data Compound
Profile
Reactivity Filters
Physchem (PFI, fsp3, solubility)
Desirability (drug like, lead like)
Off target (vEXP etc)
Safety (DEREK, eHomo)
DMPK
Project Specific QSAR
Project Specific 3D/Free Energy
Synthetic tractability/developability
ML &
physics based models
High Throughput
Low
Make
Test
17. Inspiration and thanks
Ed Maliski and John Bradshaw
Insert your date / confidentiality text herePresentation title 17
“A computer language, ALEMBIC, is used to collate the ideas of the scientists. The
resulting list of potential molecules is then parametrised using whole molecule
descriptors. Based on these descriptors, appropriate statistical techniques are used
to generate sets of molecules
retaining the maximum amount of the information inherent in all possible
combinations of the scientists ideas”Maliski EG, Latour K, Bradshaw J (1992) The whole molecule design approach to drug discovery. Drug Des
Discov 9:1–9
18. The Challenge
• Build a system which combines methods from different disciplines
• ML, Cheminformatics, Chemometrics, Optimisation
• Robust enough for use by multiple people across a portfolio of projects
• Scales to very large numbers of compounds
• Delivers small numbers of compounds that can be ingested by a human
• Simple to add/modify/remove methods without redeveloping the interface
• Add/modify/remove methods without the need to retrain users
Insert your date / confidentiality text herePresentation title 18
19. The Opportunity
– Build in best practice
– Safety alerts, physicochemical properties, institutional memory, multi parameter optimisation, synthetic tractability
– Automate the expert
– Reduce time/money spent on end user software
– Address cycle times through different ways of working
Insert your date / confidentiality text herePresentation title 19
20. 20
BRADSHAW high level architecture
BRADSHAW Client
Webservices
Algorithms, models GSK Infrastructure
HPC
DB
22. Tasks
• BRADSHAW orchestrates the running of workflows (“Tasks”) on compound
sets and chaining the inputs/outputs of these Tasks to form designs.
– A Task is the term used to identify a particular scientific process
– Molecule Generator, Molecule Filter, Active Learning
– Tasks are described by an interface to a web service with a set of parameters,
with the expected columns in input and output files also defined.
– An administrator can easily create new Tasks without redeployment of the system.
Insert your date / confidentiality text herePresentation title 22
23. • de novo design
• Given a set of constraints, generate molecular structures
which satisfy those constraints
• Classic problems with de novo design algorithms
• Nonsense structures
• Structures with intrinsic liabilities
• Structures that cannot be made
• BRADSHAW takes a dual approach
• Cheminformatics methods to generate plausible
structures based on what has been done before
• Deep Learning algorithms trained on relevant GSK chemistry
space including novel methods
GSK Molecule Generators
Deep Learning
RNN
JTVAE
*RG2SMI
Knowledge based
*BioDig
*Fit&Predict
MATSY
Reaction based
*BRICS
* GSK specific methods and/or implementations
Degen, et al. ChemMedChem 3, 1503-1507 (2008). Hussain & Rea JCIM 50, 339-348 (2010).
Free & Wilson. J. Med. Chem. 7, 395-399 (1964). Pogany, Pickett et al. JCIM 59, 1136-1146 (2019).
24. GSK BRICS : Building on what chemists have made
GSK algorithm to do fragment replacement
Replace fragments with equivalent
attachments from GSK chemistry space
RECAP: Lewell et al. J Chem Inf Comput Sci. 38, 511-22 (1998)
BRICS (rdkit): Degen et al. ChemMedChem 3:1503–7 (2008)
25. BioDig – Automated SAR extraction
Matched Molecular Pairs: Hussain, Rea, J. Chem. Inf. Model. 50, 339-348 (2010)
25
pClearance = -0.401 pClearance = 0.192
Transform rule
ΔpClear = 0.593
Property
Number of
compounds
Number of MMPs
Clearance (Invitro) 48K 9.1 Million
Clearance (Rat) 62K 15.1 Million
Clearance (Mouse) 17K 2.2 Million
ChromlogD 435K 707 Million
Cytotox 105K 63 Million
hERG 21K 4.2 Million
P450 2C19 247K 251 Million
P450 2D6 249K 259 Million
P450 3A4 268K 288 Million
Permeability 155K 137 Million
PGP efflux 8K 0.6 Million
PP Binding 349K 386 Million
Solubility 374K 591 Million
HTS collection 2.3M 13.4 Billion
No context: Level 0 Neighbour context: Level 2
Wide distribution
Positive
distribution
26. – Problem
– RNNs, GANs and Autoencoders rely on
– large numbers of known compounds,
– imperfect models (transfer or reinforcement learning)
– post-hoc filtering to target particular regions of chemical space.
Reduced Graph to SMILES
Deep Learning for Molecule Generation: From one hit?
Oc1cc(N)c(Cl)cc1C(=O)NC1CCNCC1
X
[Cr][Cu][Y]
– Hypothesis
– The Reduced Graph represents chemical space at a higher level that could avoid
this complication but is a one way encoding
– Solution
– Use latest deep learning algorithms from language translation to translate Reduced
Graph to SMILES.
– Implements several novel features: bi-directional LSTMs and attention mechanism
27. Multiple Molecules
output
All with same RG
Reduced Graph to SMILES
Deep Learning for Molecule Generation: From one hit?
27
RG input
[Cr][Cu][Y]
Pogány, Arad, Genway, Pickett, De Novo Molecule Design by Translating from Reduced Graphs to SMILES. Journal of Chemical
Information and Modeling 2019 59 (3), 1136-1146
28. • We can generate molecules
• But do we generate the “right” ones?
• And what is the context for “right”?
Validating the methods
Systematic Ideation
Deep Learning
RNN
RG2SMI
Knowledge based
BioDig
Fit&Predict
Reaction based
BRICS
29. Our experiment
A “hit 2 lead” use case
– Single hit from a screen
– Ask a number of medicinal chemists to describe the 20 molecules they would make
– Compare these to what our molecule generators produce
Insert your date / confidentiality text here4x3 core presentation 29
39. Integrating the disruptive with the pragmatic
39
• At some stage humans needs to
make a decision to make & test
molecules
• Annotation of ideas, their
provenance and quality is
important
• Framing ideas for easy
digestion:
- Clustering
- SMARTS matching
e.g. “LHS”, “RHS”, “Core”
40. – BRADSHAW is a fully automated “predict first” design system
– High level scientific workflows implement best practice
– Customisation is possible via XML configuration
– Adding new Tasks is simple and requires no software development
Summary
40
41. Stefan Senger
Stephen Pickett
Chris Luscombe
Sandeep Pal
Ian Wall
Jamel Meslamani
Jennifer Elward
Peter Pogany
David Marcus
Baptiste Canault
Richard Lonsdale
Jacob Bush
Silvia Amabilino
Eric Manas
David Brett, Adam Powell, Jonathan Masson (Tessella Ltd)
Acknowledgements
41
42. Poll Question 1:
What are the areas of Research where the utilization
of AI seems the most promising? Choose one or more
A. Disease biology understanding
B. Identification of new targets
C. Identification of new biomarkers
D. Patient stratification
E. Predictive toxicology
43. Poll Question 2:
What factors limit the use of AI for research in your
organization the most? Choose one or more
A. Interpretability of results
B. Data availability
C. Reproducibility of results
D. Regulatory restrictions