Emerging Challenges for Artificial Intelligence in Medicinal Chemistry

October 2019
Exploiting medicinal chemistry knowledge to accelerate projects
Emerging Challenges for Artificial Intelligence in
Medicinal Chemistry
Dr Ed Griffen
IBSA Lugano October 2019

Exploiting medicinal chemistry knowledge to accelerate projectsExploiting medicinal chemistry knowledge to accelerate projects
• Founded in 2012 by experienced large Pharma
medicinal/computational chemists to accelerate drug
hunting by exploiting data driven knowledge
• Domain leaders in SAR knowledge extraction and
knowledge based design
• > 10 years experience of building AI systems that suggest
actions to chemists (7 years as MedChemica)
• Creators of largest ever documented database of
medicinal chemistry ADMET knowledge

…7 Years of working with pharma
companies
“Our median number of compounds per LO project is 3000 - this is
unsustainable… [it should be] 300”
– Director of Chemistry (large pharma)
“Can we define the text book of medincal chemistry?”
– Director of Comp Chem (large pharma)
“We are aiming at 300 compound per project – currently we are
about 400, we will get better”
– ExScienta scientist at SCI ‘What can BigData do for chemistry’ –
London Oct 2017
MedChemica uses knowledge extraction techniques to build “expert
systems” to suggest actions to chemists and reduce the time and cost
to critical compounds and candidate drugs.

Explainable AI
The future of AI lies in enabling people to collaborate with machines to solve complex
problems.
Like any efficient collaboration, this requires good communication, trust, clarity and
understanding.
- Freddy Lecue, Explainable AI Research Lead, Accenture Labs
https://www.accenture.com/gb-en/insights/technology/explainable-ai-human-machine
Black box machine learning models are currently being used for high-stakes decision
making throughout society, causing problems in healthcare, criminal justice and other
domains. Some people hope that creating methods for explaining these black box
models will alleviate some of the problems, but trying to explain black box models,
rather than creating models that are interpretable in the first place, is likely to
perpetuate bad practice and can potentially cause great harm to society. The way
forward is to design models that are inherently interpretable.
- Cynthia Rudin Nature Machine Intelligence (2019), 206–215.

Use the right Machine Learning tool for the right problem
Where is Medicinal Chemistry?
Interpretable
Failure cost high
Immature science
Highly skilled, critical users
Business-2-Business
Transparent and auditable
Black Box
Failure cost is low
Real time response critical
Interactive = self correcting
Business-2-consumer
User agnostic of process

Help the HiPPOs – or they’ll crush you
1. McAfee & Brynjolfsson “Big Data: The Management Revolution”,
Harvard Business Review October 2012
“Companies often make most of their
important decisions by relying on
“HiPPO”—the highest-paid person’s
opinion.”1
Chemistry HiPPs:
• experts in pattern recognition
• judged on their ability to make the best decisions with partial data
• highly trained
• time poor
• delivery focused
• gatekeepers to the adoption of new approaches

Data
Warehouse
rule
finder
Exploitable
Knowledge
Molecule
problem
solving
Explainable
QSAR
Automated
loader
MMPA
Clean
Structures &
Data
Property
Prediction
Idea ranking
Instant SAR
analysis
MCPairs
REST API & GUI
Explainable AI for Medicinal Chemistry Design

Molecule Problem Solving
Compounds from Rules
• Exploitable Knowledge is a rule database derived from MMPA
• User puts in a problem molecule with a property they wish to
improve – eg solubility, metabolism, hERG….
• System generates potential improved molecules based on data
Exploitable
Knowledge
MC Expert
Enumerator
System
Problem molecule + property to improve
Solution molecules
Compounds from Rules
https://www.youtube.com/watch?v=lITAT6_-i1E&list=PLtkCAojNL97xs1kd5JHngjIRhl4ZPFTlL&index=3

https://youtu.be/nQxXddJDTfc

MMPA Enables knowledge sharing
MMPA
MMPA
MMPA
Combine
and
Extract
Rules
Multiple Pharma
ADMET data
>437000 rules
Better
Project
decisions
Increased
Medicinal
Chemistry
learning
Kramer, Robb, Ting, Zheng, Griffen, et al. J. Med. Chem. 2018, 61(8), 3277-3292
http://pubs.acs.org/doi/10.1021/acs.jmedchem.7b00935
Our MMPA technology enabled knowledge sharing between multiple
organisations (AstraZeneca, Hoffman La Roche and Genentech)

Griffen, E. et al. J. Med. Chem. 2011, 54(22), pp.7739-7750.
Fully Automated Matched Molecular Pair Analysis (MMPA)
Knowledge Extraction that’s understandable by chemists
Δ Data
A-B1
2
2
3
3
3
4
4
4
12
23
3
34
4
4A B
• Matched Molecular Pairs – Molecules that differ only by a
particular, well-defined structural transformation
• Capture the change and environment – MMPs can be
recorded as transformations from Aà B
• Statistical analysis to define “medicinal chemistry rules”
Defined transformations with high probability of improving
properties of molecules
• Store in a high performance database and provide an
intuitive user interface

Identify and group matching SMIRKS
Calculate statistical parameters for each unique
SMIRKS (n, median, sd, se, n_up/n_down)
Is n ≥ 6?
Not enough data:
ignore transformation
Is the |median| ≤ 0.05 and the
intercentile range (10-90%) ≤ 0.3?
Perform two-tailed binomial test on the
transformation to determine the
significance of the up/ down frequency
transformation is
classified as ‘neutral’
Transformation classified as
‘NED’ (No Effect Determined)
Transformation classified as
‘increase’ or ‘decrease’
depending on which direction the
property is changing
pass fail
yes no
yes no
Rule selection
0 +ve-ve
Median data difference
Neutral IncreaseDecrease
NED
• No assumption of normal
distribution
• Manages ‘censored’ =
qualified / out-of-range
data

Base of Success Story from Genentech
193 compounds
Enumerated
Objective:
improve
metabolic
stability
Enumeration
Calculated Property
Docking
8 compounds
synthesized
100 cmpds x ($2K make + $1K test) = $ 300 000
8 cmpds x ($2K make + $1K test) = $ 24 000
It is not just money, it is actually time
100 cmpds make & test ~ 15 – 25 weeks
8 cmpds make & test ~ 2 – 4 weeks

tBu metabolism issue
Benchmark
compound
Predicted to offer most improvement in microsomal stability (in at least 1 species / assay)
R2
R1
tBu Me Et iPr
99
392
16
64
78
410
53
550
99
288
78
515
41
35
98
327
92
372
24
247
35
128
24
62
60
395
39
445
3
21
20
27
57
89
54
89
• Data shown are Clint for HLM and MLM (top and bottom, respectively)
R1 R2R1tBu
Roger Butlin
Rebecca Newton
Allan Jordan

Tubulin Polymerization Inhibitors
15

Indole-3-glyoxylamide Based Series of Tubulin Polymerization Inhibitors
– Increase potency, solubility and reduce metabolism
– Enable in-vivo xenograft studies
Thompson, M. et al J. Med. Chem., 2015, 58 (23),
pp 9309–9333
MMPA solubility
& QSAR calcsIndibulin D-24851
LC50 0.032
XlogP 3.35
~ potent
In-vivo activity
poor solubility (~ 1uM)
LC50 0.027
XlogP 2.02
LC50 0.055
XlogP 2.91
solubility (~10-80uM)
LC50 0.031
XlogP 2.57
solubility (~10-80uM)
59

Idea Ranking
SpotDesign
• Use the knowledge database to estimate how good an idea is
compared to a benchmark molecule
• System generates assessment based on data
17
Exploitable
Knowledge
SpotDesign
Idea molecule + benchmark
molecule + property
Assessment of idea molecule
compared to benchmark
SpotDesign
https://www.youtube.com/watch?v=JMhQvNdBOFs&index=2&list=PLtkCAojNL97xs1kd5JHngjIRhl4ZPFTlL

https://youtu.be/fDpFo53IdOE

Property Prediction
Automated Explainable QSAR
• Chemists get predictions with the substructures highlighted that are
driving prediction and the molecules used to support that part of the
model – transparent / explainable AI.
Explainable
QSAR
Clean
Structures &
Data
Property
Prediction
Molecule Structure
+ property to predict
Prediction
+ clear drivers of prediction

2
Feature Definition
Basic Group Atom or group most likely protonated at pH 7.4
Acidic Group Atom or group most likely deprotonated at pH 7.4, includes N
and C acids
Acceptor Definitions derived from Taylor & Cosgrove
Donor Definitions derived from Taylor & Cosgrove
Hydrophobic C4 or greater cyclic or acyclic alkyl group
Aromatic Attachment connection of any group to an aromatic atom excluding
connections within rings
Aliphatic Attachment connection of any atom to an aliphatic group not in a ring.
Halo F,Cl, Br, I
Reference for Donor acceptor feature definitions:
Taylor, R.; Cole, J. C.; Cosgrove, D. A.; Gardiner, E. J.; Gillet, V. J.; Korb, O. J Comput Aided Mol Des 2012, 26 (4), 451–
472.
Acid & Base definitions are SMARTS including C, N, heteroaromatic acids, bases excluding weak aniline bases,
including amidines, guanidine’s - MedChemica definitions.
MedChemica Advanced Pharmacophore Pairs
Gobbi, A.; Poppinger, D. Biotechnology and Bioengineering 1998, 61 (1), 47–54.
Reutlinger, M.; Koch, C. P.; Reker, D.; Todoroff, N.; Schneider, P.; Rodrigues, T.; Schneider, G. Mol. Inf. 2013, 32 (2),
133–138.

Pay attention to your descriptors
• Chemistry must make sense
Simple
H bond
acceptor
base acid
Precise
Diclofenac
(1973)
Sulfadiazine
(1941)
DMAP

Regression Forest & Pharmacophore understanding
• hERG – auditable models
• Identify important chemical features driving potency
• Predict hERG potency from RF model [10 fold CV]
Pharmacophore fp length 280
10 fold CV
Compounds in training 5968
RMSE 0.16
Pearson R2 0.27

• Predict hERG potency from RF model [10 fold CV]
• Example CHEMBL12713 sertindole
• Colour structure by feature importance
weighted sum of of pharmacophore pair
fingerprints – show the chemists where the
hotspots are.
• Drill deeper to show the most important
positive and negative features. RF prediction pIC50 7.7
median_with: 5.1
median_without: 4.7
median_diff: 0.4
n_examples_with: 4585
n_examples_without : 1383
median_with: 5.1,
median_without: 5.3
median_diff: -0.2
n_examples_with: 3106
n_examples_without : 2862
Regression Forest & Pharmacophore understanding

kNN – Understanding from neighbouring structures
• Predict hERG potency from kNN model [10 fold CV]
• Example CHEMBL12713 sertindole
• Identify the closest neighbours - by
Tanimoto to ECFP4 fingerprint
• Show chemists structures
kNN prediction pIC50 8.2
distance 0.17 0.2 0.23
pIC50 7.7 4.1 8.2

• ML models built for 20 critical seizure related CNS targets
• Communicate to chemists activity prediction & if model out of domain
• Show close structures and/or toxophores
Seizure prediction by Composite Machine Learning
CHEMBL 12713 sertindole
seizure activity observed
clinically
Predictions in line with
measured data
More potent than 1µM
Less potent than 1µM
Out of Domain – no
prediction possible

Estimating Risks, finding toxophores
26

Pair & Rule
Database
Compounds
from Rules
API server
RESTful
API
Compound
to Pairs
MCRules
Corporate structures and measurements
from DB
Structure and
data clean up
Spot Design
Pair
finding
Web GUI
MedChemica
In-House
Design tools
CLI
MedChemica
Clean Structures
& Data
Explainable
QSAR
Engineering and Automation

Data
Integrity and
curation Knowledge
extraction
algorithms
Engineering,
Automation
and
Interfaces
Interpretability
✓
✓
✓
✓
Knowledge
Database
MCPairs
Overcoming the Barriers to Implementing AI
MC GUI

A Less Simple Example
Increase logD and gain solubility
Property Number of
Observations
Direction Mean Change Probability
logD 8 Increase 1.2 100%
Log(Solubility) 14 Increase 1.4 92%
What is the effect on
lipophilicity and solubility?
Roche data is inconclusive! (2
pairs for logD, 1 pair for
solubility)
logD = 2.65
Kinetic solubility = 84 µg/ml
IC50 SST5 = 0.8 µM
logD = 3.63
Kinetic solubility = >452 µg/ml
IC50 SST5 = 0.19 µM
Question:
Available
Statistics:
Roche
Example:

Instant SAR Analysis
Compound to Pairs
• Chemists can instantly see the pairs to a compound and explore
property changes
31
Exploitable
Knowledge
Compound to
Pairs
Molecule of interest
All the matched pairs of that molecule
Compound to Pairs
https://www.youtube.com/watch?v=OFhZJulxsAw&t=0s&list=PLtkCAojNL97xs1kd5JHngjIRhl4ZPFTlL&index=2

https://youtu.be/OFhZJulxsAw

3 Possible input streams….
Rule
Database
REST - API
Your DB
crontab
MCPCLI
REST - API
ETL
custom
plugin
• Extract Transform Load (ETL)
• Custom plugin scripted by MedChemica
• Usually 3 – 4 weeks work
• On-site work and team interaction required
Exploitation
Your DB
Your DB
YOUR FIREWALL
assay1
• Export Flat files of data
• MCPCLI reads in files and deletes
1
2
3
• Direct Read Access to DB
• SQL searches compounds /
measurements
• https requests for compounds /
measurements
• Most robust option
data
10 years
experience
building
automated
systems
MCPairs
Server

Example Current Pharma install
Rule
Database
In-House Design tools
and workflows
REST - API
MedChemica Web
tool
MedChemica CLI
3 WAYS OF EXPLOITATION
D360
crontab
MCPCLI
REST - API
ETL custom
plugin
• Every 2 days…
• Latest compounds structure pulled from D360 and loaded
• Latest measurements from assays pulled and loaded
• Custom plugin handled data input streaming
• Update the matched pairs and update rules
PHARMA FIREWALL
MCPairs
Server

Emerging Challenges for Artificial Intelligence in Medicinal Chemistry

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Emerging Challenges for Artificial Intelligence in Medicinal Chemistry

Similar to Emerging Challenges for Artificial Intelligence in Medicinal Chemistry (20)

More from Ed Griffen

More from Ed Griffen (7)

Recently uploaded

Recently uploaded (20)

Emerging Challenges for Artificial Intelligence in Medicinal Chemistry