Retrosynthesis tutorial v2

Introduction to
Retrosynthesis Prediction
2020. 06
Wonjun Jeong
wonjun.jg@kaist.ac.kr
wonjun.email@gmail.com

Table of Contents
• Introduction
• Retrosynthesis prediction
• Dataset description
• Overview of general approaches: Template-based, Template-free, Selection-based
• Proposed methods
• Classical computer-aided methods
• Machine learning based methods
• Challenges
• Practice
• RDKit
• OpenNMT
• Related works
• Future directions
• Reference
• Appendix

Retrosynthesis prediction
• What is retrosynthesis prediction?
• Retrosynthesis or retrosynthetic pathway planning is the process of tracing back the
forward reaction, predicting which reactants are required to synthesize the target product.
4

• Retrosynthesis is crucial process of discovering new materials and drugs.
5
Desired
properties
Candidate
Product
Candidate
Reactants Test by chemist

• Each process of discovering new materials and drug has own error, it should be
verified by chemist.
• Expensive
6
Desired
properties
Candidate
Product
Candidate
Reactants Test by chemist

• Retrosynthesis prediction has highly depended on the trial-and-error cycles of
experienced researchers of chemical expertise.
7

• If retrosynthesis prediction can be done with high accuracy …
• Capable of unlocking future possibilities of a fully automated material/drug discovery
pipeline.
8
Desired
properties
Candidate
Product
Candidate
Reactants
Test by robot

Dataset description
• SMILES (Simplified Molecular-Input Line-Entry System) [1]
• SMILES is a specification in the form of a line notation for describing the structure of
chemical species [2].
• Generation of SMILES.
• By printing symbol nodes encountered in a depth-first tree traversal of a chemical graph
9[1] Weininger et al .[2] https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system

Dataset description
• SMILES in detail
• Character of carbon(C) is omitted in the graph.
• Hydrogen(H) is omitted in the SMILES.
• Ring structures are written by breaking each ring at an arbitrary point to make an acyclic str
ucture and adding numerical ring closure labels to show connectivity between non-adjacen
t atoms.
• Branches are described with parentheses.
• A bond is represented using one of the symbols: ., -, =, #, $, :, /,
• “.” indicates two parts are not bonded together
10[1] Weininger et al .[2] https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system

Dataset description
• Benchmark:
1. USPTO (United States Patent and Trademark Office)
• USPTO benchmark contains SMIELS representation of single target product (input) and
reactants (target)
• Variants
• USPTO-50k
• USTPO-500K
• USPTO-MIT
2. Pistachio [32]
3. Reaxys [25]
11[25] reaxys.com [32] Mayfield et al.

Overview of general approaches: Template-based
• Template-based approaches [2, 3, 4, 5, 14, 15, 16, 17] use the known chemical
reaction which is called reaction template.
• Reaction template contains sub-graph reaction patterns that describing how the reaction
occur between reactants and product.
• Pros
• High interpretability
• Cons
• Low generalizability to unseen templates
• Require domain knowledge to extract the reaction templates
12

Overview of general approaches: Template-free
• Template-free approaches [6, 7, 8, 9, 10, 12] learn mapping function product to a set of
reactants by extracting features directly from data.
• Seq2Seq framework
• [6, 7, 8, 12]
• Graph2Grpah framework
• [9, 10]
• Pros
• Generalizability
• Not require domain knowledge
• Cons
• Invalid/Inaccessible predictions
• Low interpretability
13
f

Overview of general approaches: Selection-based
• Selection-based approaches [11] select a candidate set of purchasable reactants.
• The objective of [11] is to discover retrosynthetic routes from a given desired product to co
mmercially available reactants
• Pros
• Accessibility of the prediction
• Not require domain knowledge
• Cons
• Novelty
14[11] Guo et al.
Rank := f(product; )
Purchasable pool

Classical computer-aided methods
• Before deep learning, computer-aided retrosynthesis were mainly conducted using
reaction template. [2, 3, 4, 15, 16, 17]
• They are mainly about how to use known reactions and extract meaningful reaction
context.
• Characteristics
• It needs chemical expertise.
• Heuristics
• Computationally expensive
• Chemical space is vast
• Subgraph isomorphism problem*1.
• Not scalable
• Not generalizable
16*1: Appendix-1

Classical computer-aided methods
• The first computer-aided retrosynthesis:
• [18] Corey et al., “Computer-assisted analysis in organic synthesis.”, Science, 1985
• The author won the Nobel Prize in Chemistry for his contribution of retrosynthetic analysis.
• [19] The Logic of Chemical Synthesis: Multistep Synthesis of Complex Carbogenic Mol
ecules (Nobel lecture), 1991
17[18, 19] Corey et al.

Classical computer-aided methods:
Recent work [3] 2017
18[3] Coley et al.

• Key Idea
• It uses product similarity and reactants similarity to rank template of precedent reactions.
19[3] Coley et al.
Recent work [3] 2017 – Key Idea

• How to measure molecular similarity*2?
• Molecular fingerprints are a way of encoding the structure of molecule. We can use RDKit
library to get it.
• Most common way is Tanimoto similarity, but there is no canonical definition of molecule
similarity (subgraph isomorphism problem*1).
• , : Molecular fingerprint
20*1: Appendix-1, *2: Appendix-2
Img from [20]
Recent work [3] 2017 – Method (Similarity)

• Example of using similarity in [3]
• Total similarity := Product Sim * Reactants (Precursor) sim
21[3] Coley et al.
Rank
Recent work [3] 2017 – Method (Using similarity)

• Result of [3]
• [3] performs better than seq2seq. However, the seq2seq in table is template-free and [3] is
template-based.
• Contribution
• It mimics the retrosynthetic strategy by using molecular similarity without need to encode
any chemical knowledge.
• Limitation
• It inherently disfavors making creative retrosynthetic strategy because it relies on
precedent reactions.
22*3: Appendix-3
*3
Recent work [3] 2017 - Results

Table of Contents
• Introduction
• Retrosynthesis prediction
• Dataset description
• Overview of general approaches: Template-based, Template-free, Selection-based
• Classical computer-aided methods
• Machine learning based methods
• Challenges
• Practice
• RDKit
• Open NMT
• Related works
• Future directions
• Reference
• Appendix
• Library
• Related works

Machine learning based methods
• Data-driven methods using machine learning and deep learning have been activated
since mid-2010s.
• The need for expertise has been reduced.
• More scalable and generalizable.
• Representative proposed methods
• Template-based
• NeuralSim [14], Graph Logic Network (GLN) [5]
• Template-free
• Seq2Seq [21], Molecular Transformer (MT) [6, 7], Latent variable Transformer (LV-MT)
[8], Self-Corrected Transformer (SCROP) [22], Graph2Graph (G2G) [9], GraphRetro [10]
• Selection-based
• Bayesian-Retro [11]
24

Template-based: NeuralSim [14] 2017
25[14] Segler et al.

• Template-based: NeuralSim [14] (2017)
• Key Idea
• Given a target product, it uses neural network to predict most suitable rule in reaction
template.
Template-based: NeuralSim [14] 2017 – Key Idea

• Template-based: NeuralSim [14]
• It uses primitive models such as MLP and Highway network [23].
• It defines rule-selection as a multiclass classification.
• Molecular Descriptor [24] is defined as sum of molecular fingerprint:
27[14] Segler et al. [23] Srivastava et al. [24] pdf file
Template-based: NeuralSim [14] 2017 - Method

• Template-based: NeuralSim [14]
• Experiments
• Dataset: Reaxys database [25]
• # of class: 8720
• Contribution
• It shows neural networks can learn to which molecular context particular rules can be applied.
• Limitation
• The performance is affected by rule set cardinality.
• The larger the set size, the lower the performance.
Template-based: NeuralSim [14] 2017 - Results

• Template-based: Graph Logic Network (GLN) [5] (NeurIPS 2019)
29[5] Dai et al.
Template-based: Graph Logic Network [5] 2019

• Key Idea
• Modeling the joint distribution of reaction templates and reactants using logic variable.
• It learns when rules from reaction templates should be applied.
30[5] Dai et al.
Template-based: Graph Logic Network [5] 2019 – Key Idea

• Retrosynthesis Template
• Using the retrosynthesis template can be decomposed into 2-step logic.
• Match template
• Match reactants
31[5] Dai et al.
Template-based: Graph Logic Network [5] 2019 - Background

• Match template
• Match reactants
• Uncertainty
• Template score function
• Reactants score function
32[5] Dai et al.
Template-based: Graph Logic Network [5] 2019 - Method

• Final joint probability
33[5] Dai et al. *4: Appendix-4
Parameterizing by GNN (Graph Neural Network)*4

• MLE with Efficient Inference
• Gradient approximation
34
[5] Dai et al.

• Top-k results
• Contribution
• Interpretability: Integration of probabilistic models and template(chemical rule)
• Limitation
• It share limitations of template-based method
• Scalability
35[5] Dai et al.
Template-based: Graph Logic Network [5] 2019 - Results

36[21] Liu et al.
Template-free: Seq2Seq [21] 2017

• Template-free: Seq2Seq [21] (2017)
• It tokenizes SMILES and treats retrosynthesis as machine translation.
• It uses bidirectional LSTM for a encoder and decoder.
• It uses beam search to produce a set of reactants.
37[21] Liu et al.
Template-free: Seq2Seq [21] 2017 - Method

• Results
• It performs comparably to the rule-based expert system baseline.
• Contribution
• It shows fully data-driven seq2seq model can learn retrosynthetic pathway.
• Limitations
• It produces grammatically invalid SMILES and chemically implausible predictions.
• Just naïve application of seq2seq model.
• Predictions generated by a vanilla seq2seq model with beam search typically exempliﬁes
low diversity with only minor differences in the sufﬁx. [8]
38[21] Liu et al, [8] Chen et al
Template-free: Seq2Seq [21] 2017 – Results

• Grammatically invalid SMILES
• Grammatically valid but chemically implausible
39[21] Liu et al.
Template-free: Seq2Seq [21] 2017 – Results

40[6] Schwaller et al., [7] Lee et al.
Template-free: Molecular Transformer [6, 7] 2019

• Key Idea
• It also tokenizes SMILES and treats retrosynthesis as machine translation like [21].
• It uses Transformer instead of LSTM
• It performs better than seq2seq [21] but has same limitations.
41
Template-free: Molecular Transformer [6, 7] 2019 – Key Idea
[6] Schwaller et al., [7] Lee et al. [21] Liu et al.

• Template-free: Latent variable Transformer (LV-MT) [8] (arXiv 2019)
42[8] Chen et al.
Template-free: LV-MT [8] 2019

• It extends Molecular Transformer (MT) to become more generalizable to rare
reactions and produce diverse path.
• Key Idea
• It proposes novel pretrain method.
• Random bond cut
• Template-based bond cut
• It trains a mixture model with the online hard-EM algorithm.
43[8] Chen et al
Template-free: LV-MT [8] 2019 – Key Idea

• Pretrain methods
• Random bond cut
• For each input target product, it generates new examples by selecting a random
bond to break.
• Template-based bond cut
• Instead of randomly breaking bonds, it uses the templates to break bonds.
• The model is pre-trained on these auxiliary examples, and then used as initialization
to be ﬁne-tuned on the actual retrosynthesis data.
44
Template-free: LV-MT [8] 2019 – Method (Pretrain)
[8] Chen et al

• Why latent variables are introduced?
• It tackles the problem of generating diverse predictions.
• The outputs of beam search tend to be similar to each other.
• Given a target SMILES string x and reactants SMILES string y, a mixture model
introduces a multinomial latent variable z ∈ { 1, · · · , K } to capture different reaction
types, and decomposes the marginal likelihood as:
45
Template-free: LV-MT [8] 2019 – Method (Latent Var.)
[8] Chen et al

• Hard-EM algorithm
1. Taking a mini-batch of training examples
2. It enumerates all K values of z and compute their loss,
• Dropout should be turned off [26].
3. For each , it selects the value of z that yields the minimum loss:
• For p(y | z, x; θ), it shares the encoder-decoder network among mixture components, and
feed the embedding of z as an input to the decoder so that y is conditioned on it
4. Back-propagate through it, so only one component receives gradients per example.
• Dropout should be turned back on [26].
46[8] Chen et al., [26] Shen et al.
Template-free: LV-MT [8] 2019 – Method (Latent Var.)

• Results*5
47*5: We report better hyper-parameters and the results in Appendix-5
Template-free: LV-MT [8] 2019 – Results

• Contributions
• It proposes novel pretraining methods for retrosynthesis.
• It uses mixture model Transformer for diverse predictions.
• Limitations
• The more latent variables are used, the worse the top 1 performance.
• The latent variable does not appear to contain information about the reaction class.
48
Template-free: LV-MT [8] 2019 – Results
[8] Chen et al

• Template-free: Self-Corrected Transformer (SCROP) [22] (2020)
49[22] Zheng et al.
Template-free: SCROP [22] 2020

• Template-free: Self-Corrected Transformer (SCROP) [22] (2020)
• Key Idea
• It uses Transformer for correcting invalid predicted SMILES
• It makes syntax correction data via trained Transformer by constructing set of invalid
prediction-ground truth pairs.
• It trains another Transformer for syntax corrector using syntax correction data.
• At test time, it retains the top-1 candidate produced by the syntax corrector and
replace the original one.
50[22] Zheng et al.
Template-free: SCROP [22] 2020 – Key Idea

• Results
• Compare to Transformer (SCROP-noSC), the performance is improved by 0.4~1.7%.
51
Template-free: SCROP [22] 2020 – Results
[22] Zheng et al.

• Invalid SMILES rates
• Limitations
• Why SCROP? We can remove invalid SMILES by using RDKit without learned model.
52[22] Zheng et al.
Template-free: SCROP [22] 2020 – Results

• Template-free: Graph2Graph (G2G) [9] (ICML 2020)
53[9] Shi et al.
Template-free: G2G [9] 2020

• Key Idea
• It decomposes retrosynthesis as 2-step procedure:
• Breaking target product
• Transforming broken target product
• It trains Reaction Center Identification (RCI) module for making synthon(s) via breaking bonds in a
product graph.
• It trains Variational Graph Translation module for making reactants via a series of graph
transformation.
54
Template-free: G2G [9] 2020 – Key Idea
[9] Shi et al.

• Reaction Center Identification (RCI)
• It uses a R-GCN [27] for learning graph representation.
• Overview
1. Given a chemical reaction , it derives a binary label matrix
2. Computing node embeddings and graph embedding.
3. To estimate the reactivity score of atom pair (i,j), the edge embedding is formed by
concatenating several features.
4. The final reactivity score of the atom pair (i, j) is calculated as:
5. The RCI is optimized by maximizing the cross entropy of the binary label
55
Template-free: G2G [9] 2020 – Method (RCI)
[9] Shi et al. [27] Schlichtkrull et al.

• Reactants generation via Variational Graph Translation (VGT).
1. It receives synthons from the RCI and transform the synthons to reactants.
2. It generates a sequence of graph transformation actions , and apply them on
the initial synthon graph.
• It assumes graph generation as a Markov Decision Process (MDP).
56
Template-free: G2G [9] 2020 – Method (VGT)
[9] Shi et al.

• Overview
1. Let transformation trajectory := , the graph transformation is
deterministic if the transformation trajectory is defined.
=
2. Let denote the graph after applying the sequence of actions to
3. Leveraging assumption of a MDP,
=
4. Finally, Graph transformation cab be factorized as follows:
57
[9] Shi et al.

• Overview (cont’d)
4. Let an action is a tuple
5. It decomposes the distribution into 3 parts:
i. Termination prediction
ii. Nodes selection
iii. Edge labeling
6. It uses variational inference by introducing an approximate posterior
58[9] Shi et al.

• Top-k result
59[9] Shi et al.
Reaction class is given Reaction class is unkwon
Template-free: G2G [9] 2020 – Results

• Module performance
• Contribution
• It novelly formulates retrosynthesis prediction as a graph-to-graphs translation task
• Limitation
• Well-tuned Molecule Transformers performs better
60
Template-free: G2G [9] 2020 – Results
[9] Shi et al.

• Template-free: GraphRetro [10] (arXiv 2020)
61
Template-free: GraphRetro [10] 2020
[10] Somnath et al.

• Template-free: GraphRetro [10] (arXiv 2020)
• Key Idea
• It also uses the idea of breaking and modifying graphs like G2G[22].
• G2G[22] modified the graph at the level of atoms, but it operates at level of molecular fragments
called as leaving groups.
• G2G: Sequential generation
• GraphRetro: Leaving group selection
62
Template-free: GraphRetro [10] 2020 – Key Idea
[10] Somnath et al.

• Top-k result
63
Template-free: GraphRetro [10] 2020 - Results
[10] Somnath et al.

• Module performance
• Contribution
• Choosing a leaving group is a good idea for retrosynthesis problems
• Limitation
• Domain knowledge is required to create a leaving group vocabulary
64
Template-free: GraphRetro [10] 2020 - Results
[10] Somnath et al.

Machine learning based
Selection-based: Bayesian Retrosynthesis [11]
65[11] Guo et al.

Selection-based: Bayesian Retrosynthesis [11]
66
Cont’d
[11] Guo et al.

Selection-based: Bayesian Retrosynthesis [11] – Key Idea
• Key Idea
• It uses pre-trained forward model for likelihood of Bayes’ theorem and uses approximate
posterior distribution of reactants.
• It uses Monte Carlo search for exploring synthetic routes
67[11] Guo et al.

Selection-based: Bayesian Retrosynthesis [11] – Method
• Method
• Likelihood is the Boltzmann distribution with an inverse temperature.
• Energy function: Tanimoto distance between target product and predicted product
• Approximate posterior
• Exact computation across all candidates is generally infeasible.
68
Predicted product by forward model (Molecular Transformer)
[11] Guo et al.

Selection-based: Bayesian Retrosynthesis [11] – Method (SMC)
• Method (Cont’d)
• Sampling from the posterior
• Sequential Monte Carlo (SMC)
•
• Cons
• Particle impoverishment [38]
• Rapid loss of diversity
• Computation cost of using forward model (Molecular Transformer)
69[11] Guo et al. [38] Stavropoulos et al.

Selection-based: Bayesian Retrosynthesis [11] – Method
• Method (Cont’d)
• SMC accelerated by surrogate likelihood.
• It trains Gradient Boosting Regression Tree that predicts likelihood of Molecular
Transformer
70[11] Guo et al.

Selection-based: Bayesian Retrosynthesis [11] – Results
• Results
71[11] Guo et al.

Challenges
Challenge 1. Balancing between template-free and template-based model
Challenge 2. Multi-Step retrosynthesis
Challenge 3. Extremely large space of synthesis routes
Challenge 4. Molecule decoding (Graph generation)
73[3] Coley et al. [14] Segler et al.

Challenges:
1. Balancing between template-free and template-based model
• How about a hybrid model using uncertainty ?
74
f
Pros
• High
interpretability
Cons
• Low
generalizability
• Require domain
knowledge
Pros
• Generalizability
Cons
• Invalid/Inaccessible
predictions
• Low interpretability

• Most chemical molecules in real world cannot be synthesized within one step.
• It could go up to 60 steps or even more.
• Error accumulation
• Extremely large space
• Most recent work [13] uses neural guided A* search.
75[13] Chen et al.
Challenges:
2. Multi-Step retrosynthesis

• Each molecule could be synthesized by hundreds of different possible reactants.
• How to measure a good synthesis routes ?
76
Challenges:
3. Extremely large space of synthesis routes

• Modeling complex distributions over graphs and then efficiently sampling is challengin
g!
• Why is it challenging?
• Non-unique
• High dimensional nature of graphs
• Complex, non-local dependencies b/w nodes and edges.
• Graph VAE [29] (ICANN 2018)
• Graph RNN [30] (ICML 2018)
• GRAN [31] (NeurIPS 2019)
• Junction tree VAE [35] (ICML 2019)
77[29] Schlichtkrull et al. [30] You et al. [31] Liao et al. [35] Jin et al.
Challenges:
4. Molecule decoding (Graph generation)

Practice: RDkit
• Data pre-processing (RDKit)
• RDKit[20] is an open-source library for Cheminformatics.
• https://www.rdkit.org
• Why RDKit?
• Visualizing
• Substructure searching
• Calculate molecule similarity
• Validity check
• Various function for Cheminformatics
• We upload RDKit tutorial notebook:
• https://github.com/wonjun-dev/contrastive-retro
79

Practice: OpenNMT
• OpenNMT
• OpenNMT[28] is an open-source library for neural machine translations.
• https://opennmt.net
• It supports various models for encoder-decoder framework.
• Why OpenNMT?
• It supports various models for encoder-decoder framework.
• Built-in functions.
• Easy to engineer.
• Cons
• Too huge
• Flexibility
• Discontinued procedure (train-inference-performance check)*7
80[28] Klein et al., *7: We made fully-automated script.

Practice: OpenNMT – Where you should change
• OpenNMT
• Primary files in OpenNMT
• Data loader
• preprocess.py
• inputter.py (.onmt/inputters)
• Options
• opts.py (./onmt) => Several options for train, translate, preprocessing and etc. You can
make your own options in here.
• Train
• train.py => Entry point of training
• train_single.py (./ommt) => Second entry point of training
• trainer.py (./onmt) => Main training loop
• loss.py (.onmt/utils) => Several classes for loss function
• Model
• model_builder (./onmt)
• model.py (./onmt/models) => Model class
• model_saver (./onmt/models)
• Translation
• translate.py => Entry point of translation
• translator.py (./onmt/translate) => Translator class
• Performance check
• parse_output.py (./parse) => Parse predicted output and calculate accuracy via RDKit.
81

Practice: OpenNMT – Automated script
• OpenNMT
• We provide fully-automated (training to parsing) script.
• https://github.com/wonjun-dev/contrastive-retro @master branch
• run_experiment_mt.sh
• Train – Inference (Translate) – Performance check (Parse) – Averaging
• arg[0] : GPU id
• arg[1]: seed
• run_average.py
• The performance variation of MT and LV-MT is quite large depending on seed.
82

Related works
• Forward synthesis
• Given reactants and reagents, predict the products.
• [7, 34, 36, 37]
• Reaction center prediction
• The task of identifying the reaction center is related to the step of deriving the synthons
(intermediate outcomes) in retrosynthesis.
• [9, 10, 33, 34]
• Graph generation
• Generative models for real-world graphs, including social, chemical and knowledge graph
• [29, 30, 31, 35]
84

Future directions
• Training chemical language models like BERT
• Learning better chemical representation
• Atomic or molecular embedding considering chemical properties
• Robust to SMILES augmentation
• Contrastive learning
• Template-Generative Hybrid model
• Graph encoding – SMILES decoding
• Graph decoding is challenging
• Predictive model for subgraph isomorphism
• Subgraph isomorphism is a NP-complete problem, it is not scalable.
86

References
[1] Weininger et al. “A chemical language and information system. 1. introduction to methodology and encoding
rules.” Journal of Chemical Information and Modeling, 1988.
[2] Christ et al. “Mining electronic laboratory notebooks: Analysis, retrosynthesis, and reaction based
enumeration.” Journal of Chemical Information and Modeling, 2012.
[3] Coley et al. “Computer-assisted retrosynthesis based on molecular similarity.” ACS Central Science, 2017.
[4] Klucznik et al. “Efﬁcient syntheses of diverse, medicinally relevant targets planned by computer and executed
in the laboratory.” Chem, 2018.
[5] Dai et al. “Retrosynthesis prediction with conditional graph logic network”. NeurIPS, 2019.
[6] Schwaller et al. “Molecular transformer: A model for uncertainty-calibrated chemical reaction prediction.” ACS
Central Science, 2019.
[7] Lee et al. “Molecular transformer uniﬁes reaction prediction and retrosynthesis across pharma chemical space.”
Chemical Communications, 2019.
[8] Chen et al. “Learning to make generalizable and diverse predictions for retrosynthesis.” arXiv preprint 2019.
[9] Shi et al. “A graph to graphs framework for retrosynthesis prediction.”, ICML, 2020
[10] Somnath et al. “Learning graph models for template-free retrosynthesis.”, arXiv, 2020
[11] Guo et al. “A Bayesian algorithm for retrosynthesis.”, arXiv, 2020
[12] Lin et al. “Automatic retrosynthetic route planning using template-free models.”, Chem. Sci., 2020
[13] Chen et al. “Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search”, ICML, 2020
87

References
[14] Segler et al., “Neural-Symbolic machine learning for retrosynthesis and reaction prediction.”, Chemistry-A European
Journal, 2017
[15] Satoh et al., “A novel approach to retrosynthetic analysis using knowledge bases derived from reaction databases.”,
Chem. Inf. Comput. Sci., 1999
[16] Law et al., “Route designer: A retrosynthetic analysis tool utilizing automated retrosynthetic rule generation.”, Chem.
Inf., 2009
[17] Gasteiger et al., “A collection of computer methods for synthesis design and reaction prediction.”, Recl. Trav. Chim.
Pays-Bas, 1992
[18] Corey et al., “Computer-assisted analysis in organic synthesis.”, Science, 1985
[19] Corey et al., “The logic of chemical synthesis: Multistep synthesis of complex carbogenic molecules. (Nobel lecture)”,
1991
[20] http://www.rdkit.org/UGM/2012/Landrum_RDKit_UGM.Fingerprints.Final.pptx.pdf
[21] Liu et al., “Retrosynthetic reaction prediction using neural sequence-to-sequence models.”, ACS Cent. Sci., 2017
[22] Zheng et al., “Predicting retrosynthetic reactions using self-corrected transformer neural networks.”, J. Chem. Inf.
Model., 2020
[23] Srivastava et al., “Highway networks”, NIPS, 2015
[24] https://chemistry-europe.onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fchem.201605499&fil
e=chem201605499-sup-0001-misc_information.pdf
[25] http://www.reaxys.com, Reaxys is a registered trademark of RELX Intellectual Properties SA used under license.
[26] Shen et al., “Mixture model for diverse machine translations: Tricks off the trade.”, arXiv, 2019
88

References
[27] Schlichtkrull et al., “Modeling relational data with graph convolutional networks.”, In European
Semantic Web Conference, 2018
[28] Klein et al., “OpenNMT: Open-Source Toolkit for Neural Machine Translation.”, arXiv, 2017
[29] Simonovsky et al., “GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders.”,
ICANN, 2018
[30] You et al., “GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models.”, ICML, 2018
[31] Liao et al., “Efficient Graph Generation with Graph Recurrent Attention Networks.”, NeurIPS, 2019
[32] Mayfield et al., “Pistachio 2.0 edn software.”, 2018
[33] Coley et al., “A graph-convolutional neural network model for the prediction of chemical reactivity.”,
Chemical Science 2019
[34] Coley et al., “Predicting organic reaction outcomes with Weisfeiler-Lehman Network.”, NeurIPS, 2017
[35] Jin et al., “Junction Tree Variational Autoencoder for molecular graph generation.”, ICML, 2019
[36] Bradshaw et al., “A generative model for electron path.”, ICLR, 2019
[37] DO et al., “Graph transformation policy network for chemical reaction prediction.”, KDD, 2019
[38] Stavropoulos et al., “Sequential Monte Carlo method in practice.”, Springer, 2001
89

Appendix
1. Subgraph isomorphism problem
• It is a computational task in which two graphs G and H are given as input, and one must det
ermine whether G contains a subgraph that is isomorphic to H
• NP-Complete
2. Molecular similarity metrics (x and y are molecular fingerprint)
90

Appendix
3. Reaction class
• Meta-information about type of chemical reactions.
• In USPTO, there are 10 reaction classes
91

Appendix
4. Parameterizing by GNN in [5]
• Graph embedding := Averaging node embedding
92

Appendix
5. Better hyper-parameters of MT and the results.
• Dropout p=0.25 is better than p=0.1
• We can remove invalid and repeated SMILES via RDKit.
• Also, Using 6 layers and increasing the dropout rate is better than using 4 layers.
93
Top 1 Top 3 Top 5 Top 10
MT [8] 0.420 0.570 0.619 0.657
MT (p=0.25, w/o
inval/repeat)
0.432 0.645 0.709 0.771

Retrosynthesis tutorial v2

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Retrosynthesis tutorial v2

Similar to Retrosynthesis tutorial v2 (20)

Recently uploaded

Recently uploaded (20)

Retrosynthesis tutorial v2