2. “ A biological representation
of a molecule or chemical
compound with sufficient
potential to progress to a full
drug development.1
2
De novo drug design
1. https://www.beckman.com/support/faq/research/what-is-a-lead-compound#:~:text=A%20lead%20compound%20is%20a,a%20full%20drug%20development%20program.
3. Introduction
This paper is about generating chemical structures of
candidate drug molecules.
3
Problem:
Formulation of a well-
motivated hypothesis for
candidate drug compound
generation or selection based
on the available data is
challenging.
ReLeaSE:
A novel method for generating
chemical compounds with
desired physical, chemical,
and/or bioactivity properties de
novo that is based on deep
reinforcement learning (RL).
Proposed
Solution
4. Background
RL is a subfield of AI, which is used to solve dynamic
decision problems.
ML models can gain abilities to make decisions and
explore in an unsupervised and complex environment
by RL.
Using RL to generate candidate drug molecules would
avoid brute-force computing to examine the every
possible solution in the chemical space.
4
4
5. Properties of Drugs
Partition coefficient
LogP is a critical
measure that not
only determines
how well a drug will
be absorbed,
transported, and
distributed in the
body but also
dictates how a drug
should be
formulated and
dosed.
Melting Temperature
(Tm) is the
temperature at
which a given solid
drug changes from
its solid state to a
liquid, or melts
A Janus kinase
inhibitor, also known
as JAK inhibitor or
jakinib, is a type of
immune
modulating
medication, which
inhibits the activity of
one or more of the
Janus kinase family
of enzymes (JAK1,
JAK2, JAK3, TYK2).
Chemical
complexity of drug
is measures by the
number of benzene
rings.
5
Physical Properties Bioactivity Properties
Structural
Properties
16. QSPR analysis
Quantitative structure–property relationship (QSPR) analysis finds correlations
between structural descriptions and material properties through ML models.
16
= sT = P(sT)
Structural
description
Material
Property
(e.g. LogP, Tm)
For predicting LogP using 5CV, the model accuracy is 0.91 and root mean square error (RMSE) is 0.53.
25. Related work
Olivecrona et al
(2017)
No data were
provided to show
that the predicted
properties of
molecular
compounds are
optimized by the
RL model.
There was a large
fraction of the
generated
molecules, that
were similar to
those in training
and test sets.
Jaques et al. (2017)
The RL model did
not directly
optimize any
physical or
biological
properties but
rather a proxy
function that
includes a SAS,
drug-likeness, and
a ring penalty
Segler et al (2018)
Did not use RL, but
vanilla RNN.
25
25
27. Limitations
ReLeaSE generate candidate drug molecules while
independently optimizing for each chemical property
of interest.
● E.g. Having one model to generate molecules
optimized for LogP and having another model for
to generate molecules optimized for melting
temperature
27
27
29. 29
Take home message
● ReLeaSE is a new strategy for designing
libraries of compounds with the desired
properties that uses both DL and RL
approaches.
● ReLeaSE efficiency generates candidate
drug molecules while independently
optimizing for each chemical property of
interest.
What’s next?
● Extending to afford multiobjective
optimization of several target properties
concurrently with respect to potency,
selectivity, solubility, and other drug-
likeness properties.
Thanks!
31. Generating a chemically diverse
stream of molecules is important,
because drug candidates can fail in
many unexpected ways, later in the
drug discovery pipeline.[1]
On the basis of the analysis of
respective sets of 10,000 molecules
generated by each method, the library
obtained without stack memory
showed a decrease in internal
diversity of 0.2 units of the Tanimoto
coefficient and yet a fourfold increase
in the number of duplicates, from just
about 1 to 5%.[2]
31
[1] Benhenda M. Can AI reproduce observed chemical diversity?. bioRxiv. 2018 Jan 1:292177.
[2] Popova M, Isayev O, Tropsha A. Deep reinforcement learning for de novo drug design. Science advances. 2018 Jul 25;4(7):eaap7885.
Internal diversity of generated libraries.
32. Visualization of new drug libraries
32
(a) LogP values predicted by the predictive
model P
(b) Melting temperatures predicted by the
predictive model P
Editor's Notes
What is the problem tackled?
The crucial step in many new drug discovery projects is the formulation of a well-motivated hypothesis for new lead compound generation or compound selection from available or synthetically feasible chemical libraries based on the available structure-activity relationship (SAR) data.
To proceed with the rest of the presentation, we need to know what is the SMILES representation of a drug molecule.
Suppose this is a drug or any kind of molecule. You can see that it has a benzene ring, and two double bonds, etc.
[NEXT[ The SMILES representation of this molecule is the sequence of atoms in the molecule as we traverse through it. ……
So the aim of this research is to generate SMILES representations of candidate drug molecules.
Generative RNN models are often used for tasks such as painting generation, human face generation, etc.
Generative RNN models are often used for tasks such as text generation, poem generation, painting generation, human face generation, etc.
[NEXT] DESCRIBE training step.
…What we need to maximize here is how well the stack-rnn predicts the next character in the SMILES string.
The authors used cross-entropy loss function minimization for that.
A question somebody would ask here is that why stack-rnn?
[NEXT molecule image appear] Recall how we represent a molecule in a SMILES string. One must count ring opening and closure, as well as parenthesis sequences to make sure the SMILES string is a valid one. Regular RNNs such as LSTM and gated recurrent unit (GRU) are unable to keep track of such information in their memory cells. That is why we need stack memory in rnn. So whenever the model come across a input character “an opening parenthesis”, it push a parenthesis into the stack, and whenever the model come across an input character which is “a closing parenthesis”, model pops out last opening parenthesis in the stack.
[BOARD]
By the time we predict <end> token, if we still have some remaining parenthesis in the stack, that SMILES string is not a valid smiles string.
This way, the author trained ~1.5 million molecule structures taken from an existing dataset.
[NEXT]
To make sure that the model does not memorize the training molecules, authors generated 1M molecules using the model, and compared them with the with the molecules used to train the model. They found that the model produced less than 0.1% of structures from the training data set.
In drug discovery process, generating a chemically diverse stream of candidate drug molecules is important, because drug molecule candidates can fail in many unexpected ways, later in the drug discovery pipeline.
The authors claimed that the library of drug molecules obtained with stack memory showed an increase in internal chemical diversity compared to the drug moleculed obtained without a stack memory.
This predictive model P is a feed forward neural network. Authors used existing datasets to train this, and they trained different models to predict different properties.
Training these type of predictive models is also called QSPR analysis. Such analysis would find correlations between structural descriptions and material properties through ML models.
[NEXT] In our model, [NEXT] Smiles string is the structural description of the molecule and
[NEXT] predicted property is some material property like melting temperature or log p.
Building ML models directly from SMILES strings is a unique feature of our approach
Because (1) it completely bypasses the very slow step of descriptor generation in traditional QSAR modeling approaches.
(2) Moreover this MILES string based model out performed the existing QSPR models.
[NEXT] For example, for predicting logP using 5-folde cross validation, the model accuracy is 0.91 and root mean square error (RMSE) is 0.53.
Two deep neural networks: generative (G) and predictive (P).
Both models need to be trained separately with supervised learning algorithms,
Then, these two models need to be trained jointly with an RL approach that optimizes target properties.
[NEXT] The generative model is used to generate SMILES strings of novel chemically feasible molecules, that is, it plays a role of an agent.
[NEXT] Suppose sT is a SMILES string generated by G.
The predictive model takes in the generated SMILES string as the input and provides one real number as an output:
[NEXT] P(sT) which is an estimated property value. Property could be either LogP, melting temperature, etc.
[NEXT] Reward is a function of P(sT) -> [NEXT] this function f is chosen depending on the task/the property.
Considering this reward, the generative model is trained to maximize the expected reward.
Talking a bit more about the reward, it must be based on the property, the RL model trying to optimize.
Suppose it wants to maximize the number of benzene rings in the generated molecule, the reward is computed based on that.
If the generated molecule has a small number of benzene rings (like this), the reward would be small.
On the other hand, when the generated molecule has a large number of benzene rings (like this), the reward would be high.
First we will talk about the predicted properties (specifically the melting temperature and logP) of train molecules vs generated molecules.
Here the baseline is training molecules.
Here is a comparison of training vs generated molecules.
The baseline ones are the training molecules.
If you see the first property ™ the melting temperature [NEXT], the baseline has 95% valid molecules.
The generated molecules with minimized melting temperatures have 31% valide molecules, and it is much lower than that for baseline.
We can observe similar thing for The generated molecules with maximized melting temperatures as well.
However, the mean melting temperatures [NEXT] shows that the molecules with minimized and maximized melting temperatures are better than the baseline.
If you go down the table, other properties such as [NEXT] logP, [NEXT] the number of benzene rings tend to have percentage of valide molecules somewhat comparable to the baseline.
[NEXT]
Compared to these existing models, what is unique about this particular approach that I have been presenting was it produces valid SMILES strings, and it generated SMILES string that matches to less than 0.1% of the training molecules.
To understand how the generative models populate chemical space with newly generated drug molecule structures, the authors used t-distributed stochastic neighbor embedding to reduced dimensionality of the predicted value distribution from the model P. The colour code shows the magnitude of the predicted property values.
…
The labeled ones are the randomly picked generated drug molecules matched with drug modules in existing datasets