Fragment-Based RL Generates Drug Hits

Hit and Lead Discovery with Explorative RL and
Fragment-based Molecule Generation
SoojungYang, Doyeong Hwang, Seul Lee,
Seongok Ryu, Sung Ju Hwang
AITRICS and KAIST

Screening library
Hits
Chemical space
Automated small molecule drug discovery
Hits
• Searching for novel hits is a critical task in drug discovery.
2
Hits
• Molecules with high therapeutic potential
• High binding affinity to a given protein target
Small molecule Protein target
+ ➞
Protein-ligand complex
ΔG = binding free energy

• RL methods can be used for automated hit search.
Screening library
Hits
RL agent
Chemical space
3

Screening library
Hits
• RL methods can be used for automated hit search.
• Molecular docking simulation estimates protein-ligand binding affinity.
• We can guide RL agents with docking scores as reward.
RL agent
Docking
simulation score
reward
Chemical space
4

Drug molecules should satisfy strong structural constraints
➝ Few examples of structural alerts in
PAINS filter.
• Drug molecules should not have toxic or highly
reactive substructures.
• Widely-used medicinal chemistry filters include
several hundreds of diverse structural alerts.
Hits
Chemical space
High-quality hits
5

Drug molecules should satisfy strong structural constraints
• Previous molecular generative models guided by implicit methods cannot
entirely avoid inappropriate structures.
• e.g., multi-objective optimization where
total reward = docking score reward + structure penalty reward
➠ Explicit method to constrain the generation space within acceptable
molecules are necessary.
“Fragment-based molecular generation”
Hits
Chemical space
High-quality hits
6

Our fragment-based molecular generation method
7
Pharmacochemcially acceptable fragment library
Action 1
Next state molecule St+1
Action 2
Action 3 Augmented fragment
Possible attachment sites
Current state molecule St

Markovian embedding and policy network makes the model plausible for
hit generation and scaffold-based generation
8
• The embeddings of the molecule and fragments are autoregressively passed onto the policy network.
• State embedding network and policy network are Markov models.
• Any arbitrary molecule can be the current state.
➠ The model can be used for scaffold-based generation as well as hit generation.

Connectivity-preserving generation prevents unrealistic bonds
9
• We preserve the connectivity information of fragmented molecules as explicit attachment sites.
• In this way, our model can avoid the generation of molecules with unrealistic bonds.
Pyrrolium
Pyrrole Pyrrolium 2-methylpyrrole
New bond formation on N New bond formation on C

Explorative RL algorithm improves model performance in the
strongly constrained generation space
10
Chemical space
RL agent
Acceptable
chemical spaces
Good exploration
• Strong constraint in generation space makes search space less
smooth ➠ Solutions can be trapped in local optima.
• A practical drug discovery model should find as many diverse
optimal areas in the chemical space.
Our strategies
• Employ soft-actor-critic (SAC), an off-policy actor-critic algorithm
based on maximum entropy RL, which encourages exploration.
• Devise explorative algorithms based on prioritized experience replay
(PER) method.

11
priority pt
• Goal: Sample-efficient exploration
• Priority is a measure of how much additional information
we can learn from the transition.
• Pt = TD error of agent’s value estimate (Q or V function)
[Schaul et al.] Original PER
priority pt
Our PER algorithm encourages the agent to visit novel states
[Schaul et al.] Prioritized Experience Replay, ICLR 2016

12
priority pt
• Goal: Sample-efficient exploration
• Priority is a measure of how much additional information
we can learn from the transition.
• Pt = TD error of agent’s value estimate (Q or V function)
[Schaul et al.] Original PER
• Goal: Encourage sufficient diversity
• Priority is a measure of the novelty of the state
• PER(PE) Pt = predictive error of reward estimator
• PER(BU) Pt = Bayesian uncertainty of reward estimate
Ours: PER(PE), PER(BU)
Our PER algorithm encourages the agent to visit novel states
[Schaul et al.] Prioritized Experience Replay, ICLR 2016
reward
rt
reward predictor
rt
^
priority pt = rt
^
- rt
predictive error

FREED Model Overview and Contributions
13
• Our fragment-based generation method allows our model to leverage medicinal chemistry prior knowledge.
• Our proposed explorative RL method based on PER significantly improves the model performance.
FREED : Fragment-based generative RL with Explorative Experience replay for Drug design

Experimental results
14
fa7
protease
parp1
polymerase
5ht1b
G protein-coupled receptor
Three protein targets Evaluation metrics
Quality score (filter score)
: ratio of accepted, valid molecules to total
generated molecules
Hit ratio
: ratio of unique hit molecules to total
generated molecules, where hit is defined as
molecules who have higher docking scores
than the median of known active molecules
Top 5% score
: average docking score of top 5%-scored
generated molecules
* Higher = greater absolute value
Baseline models
MORLD
: atom-wise generative model + MolDQN
REINVENT
: SMILES-based generative model + REINFORCE

Results 1 – Explicit constraints are necessary to avoid problematic structures
15
• Our fragment-based generation model successfully avoids structural alerts.
• Baseline models trained to avoid alerts in multi-objective optimization scheme show suboptimal results.

Results 2 – Our model outperforms the unconstrained models
16
• Our model outperforms or at least show
comparable performance with the
unconstrained baseline models.
Hit ratio
Top 5% score
better
better

Result 3 – Our PER method shows the best performance
17
Hit ratio
Top 5% score
better
better

Result 3 – Our PER method shows the best performance
18
• All explorative algorithms outperform the vanilla SAC and PPO.
• Our PER(PE) and PER(BU) outperform previous algorithms such as PER(TD) and curiosity-driven algorithms.
Hit ratio
Top 5% score
better
better

Result 4 – Hit and scaffold-based generation case study
19
Hit generation Scaffold-based (lead) generation
• Initial structure: a benzene ring • Initial structure: a scaffold extracted from
known active molecules

Result 4 – Hit and scaffold-based generation case study
20

Summary
21
• Our model FREED is a novel RL framework for real-world drug design that couples a
fragment-based molecular generation strategy with a highly explorative RL algorithm.
• FREED can generate pharmacochemically acceptable molecules with high docking scores.
• When we want to avoid many structural alerts, explicitly constraining the molecular
generation space is more effective than implicit methods.
• By defining priority as the novelty of the state, PER method can encourage the model
exploration to find many optima in highly-constrained molecular generation space.

Fragment-Based RL Generates Drug Hits

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Similar to Fragment-Based RL Generates Drug Hits

Similar to Fragment-Based RL Generates Drug Hits (20)

More from MLAI2

More from MLAI2 (20)

Recently uploaded

Recently uploaded (20)

Fragment-Based RL Generates Drug Hits