1) The document presents a method called FREED that uses fragment-based molecule generation guided by reinforcement learning to discover novel drug hits.
2) FREED explicitly constrains molecule generation to pharmacologically acceptable fragments to avoid toxic structures, which is more effective than implicit constraint methods.
3) FREED's exploratory RL algorithm prioritizes experience replay to encourage visiting novel states and finding diverse optima in the constrained chemical space.
Breaking the Kubernetes Kill Chain: Host Path Mount
Fragment-Based RL Generates Drug Hits
1. Hit and Lead Discovery with Explorative RL and
Fragment-based Molecule Generation
SoojungYang, Doyeong Hwang, Seul Lee,
Seongok Ryu, Sung Ju Hwang
AITRICS and KAIST
2. Screening library
Hits
Chemical space
Automated small molecule drug discovery
Hits
• Searching for novel hits is a critical task in drug discovery.
2
Hits
• Molecules with high therapeutic potential
• High binding affinity to a given protein target
Small molecule Protein target
+ ➞
Protein-ligand complex
ΔG = binding free energy
3. Automated small molecule drug discovery
• Searching for novel hits is a critical task in drug discovery.
• RL methods can be used for automated hit search.
Screening library
Hits
RL agent
Chemical space
3
4. Automated small molecule drug discovery
Screening library
Hits
• Searching for novel hits is a critical task in drug discovery.
• RL methods can be used for automated hit search.
• Molecular docking simulation estimates protein-ligand binding affinity.
• We can guide RL agents with docking scores as reward.
RL agent
Docking
simulation score
reward
Chemical space
4
5. Drug molecules should satisfy strong structural constraints
➝ Few examples of structural alerts in
PAINS filter.
• Drug molecules should not have toxic or highly
reactive substructures.
• Widely-used medicinal chemistry filters include
several hundreds of diverse structural alerts.
Hits
Chemical space
High-quality hits
5
6. Drug molecules should satisfy strong structural constraints
• Previous molecular generative models guided by implicit methods cannot
entirely avoid inappropriate structures.
• e.g., multi-objective optimization where
total reward = docking score reward + structure penalty reward
➠ Explicit method to constrain the generation space within acceptable
molecules are necessary.
“Fragment-based molecular generation”
Hits
Chemical space
High-quality hits
6
7. Our fragment-based molecular generation method
7
Pharmacochemcially acceptable fragment library
Action 1
Next state molecule St+1
Action 2
Action 3 Augmented fragment
Possible attachment sites
Current state molecule St
8. Markovian embedding and policy network makes the model plausible for
hit generation and scaffold-based generation
8
• The embeddings of the molecule and fragments are autoregressively passed onto the policy network.
• State embedding network and policy network are Markov models.
• Any arbitrary molecule can be the current state.
➠ The model can be used for scaffold-based generation as well as hit generation.
9. Connectivity-preserving generation prevents unrealistic bonds
9
• We preserve the connectivity information of fragmented molecules as explicit attachment sites.
• In this way, our model can avoid the generation of molecules with unrealistic bonds.
Pyrrolium
Pyrrole Pyrrolium 2-methylpyrrole
New bond formation on N New bond formation on C
10. Explorative RL algorithm improves model performance in the
strongly constrained generation space
10
Chemical space
RL agent
Acceptable
chemical spaces
Good exploration
• Strong constraint in generation space makes search space less
smooth ➠ Solutions can be trapped in local optima.
• A practical drug discovery model should find as many diverse
optimal areas in the chemical space.
Our strategies
• Employ soft-actor-critic (SAC), an off-policy actor-critic algorithm
based on maximum entropy RL, which encourages exploration.
• Devise explorative algorithms based on prioritized experience replay
(PER) method.
11. 11
priority pt
• Goal: Sample-efficient exploration
• Priority is a measure of how much additional information
we can learn from the transition.
• Pt = TD error of agent’s value estimate (Q or V function)
[Schaul et al.] Original PER
priority pt
Our PER algorithm encourages the agent to visit novel states
[Schaul et al.] Prioritized Experience Replay, ICLR 2016
12. 12
priority pt
• Goal: Sample-efficient exploration
• Priority is a measure of how much additional information
we can learn from the transition.
• Pt = TD error of agent’s value estimate (Q or V function)
[Schaul et al.] Original PER
• Goal: Encourage sufficient diversity
• Priority is a measure of the novelty of the state
• PER(PE) Pt = predictive error of reward estimator
• PER(BU) Pt = Bayesian uncertainty of reward estimate
Ours: PER(PE), PER(BU)
Our PER algorithm encourages the agent to visit novel states
[Schaul et al.] Prioritized Experience Replay, ICLR 2016
reward
rt
reward predictor
rt
^
priority pt = rt
^
- rt
predictive error
13. FREED Model Overview and Contributions
13
• Our fragment-based generation method allows our model to leverage medicinal chemistry prior knowledge.
• Our proposed explorative RL method based on PER significantly improves the model performance.
FREED : Fragment-based generative RL with Explorative Experience replay for Drug design
14. Experimental results
14
fa7
protease
parp1
polymerase
5ht1b
G protein-coupled receptor
Three protein targets Evaluation metrics
Quality score (filter score)
: ratio of accepted, valid molecules to total
generated molecules
Hit ratio
: ratio of unique hit molecules to total
generated molecules, where hit is defined as
molecules who have higher docking scores
than the median of known active molecules
Top 5% score
: average docking score of top 5%-scored
generated molecules
* Higher = greater absolute value
Baseline models
MORLD
: atom-wise generative model + MolDQN
REINVENT
: SMILES-based generative model + REINFORCE
15. Results 1 – Explicit constraints are necessary to avoid problematic structures
15
• Our fragment-based generation model successfully avoids structural alerts.
• Baseline models trained to avoid alerts in multi-objective optimization scheme show suboptimal results.
16. Results 2 – Our model outperforms the unconstrained models
16
• Our model outperforms or at least show
comparable performance with the
unconstrained baseline models.
Hit ratio
Top 5% score
better
better
17. Result 3 – Our PER method shows the best performance
17
Hit ratio
Top 5% score
better
better
18. Result 3 – Our PER method shows the best performance
18
• All explorative algorithms outperform the vanilla SAC and PPO.
• Our PER(PE) and PER(BU) outperform previous algorithms such as PER(TD) and curiosity-driven algorithms.
Hit ratio
Top 5% score
better
better
19. Result 4 – Hit and scaffold-based generation case study
19
Hit generation Scaffold-based (lead) generation
• Initial structure: a benzene ring • Initial structure: a scaffold extracted from
known active molecules
20. Result 4 – Hit and scaffold-based generation case study
20
21. Summary
21
• Our model FREED is a novel RL framework for real-world drug design that couples a
fragment-based molecular generation strategy with a highly explorative RL algorithm.
• FREED can generate pharmacochemically acceptable molecules with high docking scores.
• When we want to avoid many structural alerts, explicitly constraining the molecular
generation space is more effective than implicit methods.
• By defining priority as the novelty of the state, PER method can encourage the model
exploration to find many optima in highly-constrained molecular generation space.