Introduction to Chainer Chemistry

Preferred Networks, Inc.
Kenta Oono (oono@preferred.jp)
Introduction to Chainer Chemistry

Chemical prediction - Two approaches
Quantum simulation
● Theory-based approach
● DFT (Density Functional Theory)
◯ Theoretically guaranteed
△ High calculation cost
Machine learning
● Data-based approach
● Learn known compound’s property,
◯ Low cost, high speed calculation
△ No precision guaranteed
“Neural message passing for quantum chemistry” Justin et al

Data-based approach for chemical prediction
• Material informatics
– Material genome initiative
– MI2
I project (NIMS)
• Drug discovery
– Big pharmas’ investment
– IPAB drug discovery contest

Graph-structured data (QM9)
Dataset containing 133,885 molecules with 15 chemical characteristics
(e.g. HOMO, LUMO)
SMILES: NC1=NCCC(=O)N1
LABEL: [ 3.51 1.93 1.29 2.54
64.1 -0.236 -2.79e-03 2.34e-01
900.7 0.12 -396.0 -396.0
-396.0 -396.0 26.9]
SMILES: CN1CCC(=O)C1=N
LABEL: [3.285 2.062 1.3 4.218
68.69 -0.224 -0.056 0.168
914.65 0.131 -379.959 -379.951
-379.95 -379.992 27.934]
SMILES: N=C1OC2CC1C(=O)O2
LABEL: [2.729 1.853 1.474 4.274
61.94 -0.282 -0.026 0.256
887.402 0.104 -473.876 -473.87
-473.869 -473.907 24.823]
SMILES: C1N2C3C4C5OC13C2C5
LABEL: [ 3.64 2.218 1.938 0.863
69.48 -0.232 0.074 0.306
756.356 0.128 -400.633 -400.628
-400.627 -400.662 23.434]

SMILES
Format of encoding molecules in text.

Extended Connectivity Fingerprint (ECFP)
https://chembioinfo.com/2011/10/3
0/revisiting-molecular-hashed-fing
erprints/
https://docs.chemaxon.com/displa
y/docs/Extended+Connectivity+Fi
ngerprint+ECFP
Fixed length bit representation (fingerprint) of molecules.

Problems of conventional methods
1. Representation is not unique
– e.g. CC#C and C#CC are same molecule.
2. Order invariance is not guaranteed
– Representation is not guaranteed to be invariant under relabeling of atoms.
3. Rule-based approach
– Not adaptive to data
→ Graph convolution

How graph convolution works
CNN on images
CNN on graphs

Unified view of graph convolution
Many message-passing algorithms (NFP, GGNN, Weave etc.) are formulated as the
iterative application of Update and Readout functions [Gilmer et al. 17].
Update Readout
Aggregates neighborhood information and updates
node representations.
Aggregates all node representations and updates the
final output.
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. (2017). Neural message
passing for quantum chemistry. arXiv preprint arXiv:1704.01212.

NFP: Neural Fingerprint
C
C C
N C
C
C
O
O
h8
hnew
7
= σ (W3
(h7
+ h6
+ h8
+ h9
))
hnew
3
= σ (W2
(h3
+ h2
+
h4
))
h2
h4
h3
Update
h6
h7
h9
Readout
C
C C
N C
C
C
O
O
R = ∑ i
softmax (Whi
)
h2
h1
h3
h4
h5
h6
h7
h8
h9

Comparison between graph convolutional networks
NFP GGNN Weave SchNet
Atom feature
extraction
Man-made or
Embed
Man-made or
Embed
Man-made or
Embed
Man-made or
Embed
Graph convolution
strategy
Adjacent
atoms only
Adjacent
atoms only
All atom-atom
pairs
All atom-atom
pairs
How to represent
connection
information
Degree Binding type Man-made
pair features
(bondtype,
distance etc.)
Distance

How can we incorporate ML to Chemistry and
Biology?
Problems
• Optimized graph convolution algorithms are hard to implement from scratch.
• ML and Chemistry/Biology researchers sometimes use different “languages”.
Solution: Create tools so that …
• Chemistry/Biology researchers do not bother details of DL algorithms and
concentrate on their research.
• ML and Chemistry researchers can work in collaboration.
→ Chainer Chemistry

Chainer Chemistry
Chainer extension library for Chemistry and Biology
Current version: v0.3.0 (4/24/2018)

Example: HOMO Prediction by NFP with QM9 dataset
preprocessor = preprocess_method_dict['nfp']()
dataset = D.get_qm9(preprocessor, labels='homo')
# Cache dataset for second use
NumpyTupleDataset.save('input/nfp_homo/data.npz', dataset)
train_data_size = int(len(dataset) * train_data_ratio)
train, val = split_dataset_random(dataset, train_data_size)
Dataset preprocessing (for NFP Network)

Example: HOMO Prediction by NFP with QM9 dataset
class GraphConvPredictor(chainer.Chain):
def __init__(self, graph_conv, mlp):
super(GraphConvPredictor, self).__init__()
with self.init_scope():
self.graph_conv = graph_conv
self.mlp = mlp
def __call__(self, atoms, adjs):
x = self.graph_conv(atoms, adjs)
x = self.mlp(x)
return x
model = GraphConvPredictor(NFP(16, 16, 4), MLP(16, 1))

Development policy
• Designed for light users
– Main user is NOT informatician / computer engineers.
– High-level interface (e.g. sklearn-like API)
• Aggressive feature introduction and API improvements
– Current major version is 0, in the sense of semantic versioning v2.0.0
– At the cost of less strict compatibility policy than Chainer
– Sub branches for experimental features

v0.2.0 selected features (3/1/2018)
• Dataset Exploration of the Tox21 and QM9 dataset
• BalancedSerialIterator
– Chainer iterator that samples each label evenly
• ROCAUCEvaluator
– Chainer evaluator that calculates ROC-AUC

v0.3.0 selected features (4/25/2018)
• More examples
– How to use own datasets
– Inference code of regression task with QM9
• Classifier / Regressor
– sklearn-like API (predict, predict_proba)
– Serialization with pickle, with limited guarantee of model portability
• SparseRSGCN (expermental, merged in experimental_sparse)
– Sparse-operation version of RSGCN (graph convolution algorithm)

Batch size: 128
Epochs: 20
Chainer: 3.3.0
CuPy: 2.3.0
CUDA: 9.0
cuDNN: 7.0
Python: 3.6.1
(pyenv anaconda3-4.4.0)
GPU: Tesla V100
CPU: Intel XeonE5-2698
[sec]
(thanks @anaruse!)

v0.4.0 planned roadmap
● More dataset support (e.g. MoleculeNet)
● More test coverage
● Performance comparison across several graph convolution models
● Pretrained models or inference web service?

Conclusion
• Data-based approach for chemical property prediction is gaining momentum.
• Chainer Chemistry is a Chainer extension library dedicated to Bio- and
Chemo- informatics.
• Chainer Chemistry implements several off-the-shelf graph convolutional
networks, which themselves are useful in other fields, too.

● Learnable parameters:
● W (weight matrix of size N x M)
● b (bias vector of size M)
● Input : vector x of size N
● Output vector y = Wx + b (affine transformation)
W/b
Fully connected layer
yx
y1
yM
・・・・
x1
xN
・・・・・・
y = Wx + b

Activation function
● Function (usually) without learnable
parameter for introducing non-linearlity
● Input: vector (or tensor) x = (x1
, …, xn
)
● Output: vector (or tensor) y = (y1
, …, yn
)
y1
yN
x1
xN
yx
・・・・・・
Examples of σ
● Sigmoid(x) = 1 / 1 + exp(-x)
● tanh(x)
● ReLU(x) = max(0, x)
● LeakyReLU(x) = x (x > 0), ax (x < 0)
○ a < 0 is a fixed constant
・・・・・・
yi
= σ(xi
) (i = 1, …, n)

Convolutional Neural Network (CNN)[LeCun+98]
• A neural network consisting of convolutional layers and pooling layers
• Many variants: AlexNet, VGG, Inception, GoogleNet, ResNet etc.
• Widely used in image recognition and recently applied to biology and chemistry
LeNet-5[LeCunn+98]
LeCun, Yann, et al. "Gradient-based learning applied to
document recognition." Proceedings of the IEEE 86.11
(1998): 2278-2324.

Convolution operation (stride = 1 case)
1 0 1
0 1 0
1 0 1
1 1 1 0 0 0
0 1 1 1 0 0
0 0 1 1 1 0
0 0 1 1 0 0
0 1 1 0 0 0
0 0 0 0 0 0
input filter
* =
output
4 3 4 1
2 4 3 3
2 3 4 1
2 2 1 1

Convolution operation (stride = 3 case)
1 0 1
0 1 0
1 0 1
1 1 1 0 0 0
0 1 1 1 0 0
0 0 1 1 1 0
0 0 1 1 0 0
0 1 1 0 0 0
0 0 0 0 0 0
input filter
* =
output
4 1
2 1

Convolutional layer
Stack several filters whose parameters are learnable

Stacking convolutional layers
Convolution layer with stride k generates
the output whose height & width are
approximately k times smaller.

Pooling layers
http://cs231n.github.io/convolutional-networks/

Example: IT Drug Discovery Contest
Task
• Find new seed compounds for a target protein (Sirtuin 1) from 2.5 million
compounds by IT technologies
Rule
• Each team needs to prepare data by itself such as training datasets.
• Each team can submit up to 400 candidate compounds
• Judge checks all submitted compounds
by a 2-stage biological experiment.
– Thermal Shift Assay
– Inhibitory assay → IC50 measurement Sirtuin 1
Contest website (Japanese)
http://www.ipab.org/eventschedule/contest/contest4

Our result
Ours Average
(18 teams in total)
1st screening (TSA) 23 / 200 (11.5%) 69 / 3559 (1.9 %)
2nd screening (IC50) 1 5
We found one hit compound and won
one of Grand prize (IPAB prize)

Tox21 Dataset
12697 molecules (train 11757, validation 295, test 645)
Label - 12 toxicity labels
SMILES:
C(=O)C1(O)Cc2c(O)c3c(c(O)c2C(OC2CC
(N)C(O)C(C)O2)C1)C(=O)c1c(O)cccc1C3
=O
LABEL: [ 0 1 -1 1 -1 1 -1 -1 1 -1 1 1]
SMILES:
CCCOc1ccc(C(=O)CCN2CCCCC2)cc1.Cl
LABEL: [ 0 0 0 -1 1 0 0 -1 -1 -1 0 0]
SMILES:
CCOP(=S)(OCC)SC(CCl)N1C(=O)c2cccc
c2C1=O
LABEL: [ 0 0 1 0 1 1 0 1 0 0 -1 -1]
SMILES:
O=c1c(O)c(-c2ccc(O)cc2)oc2cc(O)cc(O)c
12
LABEL: [ 0 0 1 -1 1 1 -1 0 0 0 1 0]
2948 3895 6558 7381

C
C C
N C
C
C
O
OH
C
C C
N C
C
C
O
O
h1
h2
h3
h4
h5
h6
h7
h8
h9
h10
W3
h9
W3
h8
W3
h6
W3
h7 h’7
= σ ( W3
(h7
+h6
+h8
+h9
) )
h’3
= σ ( W2
(h3
+h2
+h4
) )
W2
h2
W2
h4
W2
h3
Graph convolution operation depends on degree of each atom
→ Bonding type information is not utilized
Update:

C
C C
N C
C
C
O
OH
h1
h2
h3
h4
h5
h6
h7
h8
h9
h10
Readout operation is basically simply sum over the atoms
→ No selective operation/attention mechanism is adopted.
Readout:
R = ∑ i
softmax (Whi
)

GGNN: Gated Graph Neural Network
C
C C
N C
C
C
O
OH
C
C C
N C
C
C
O
O
h1
h2
h3
h4
h5
h6
h7
h8
h9
h10
W1
h9
W2
h8
W1
h6
h7 h’7
= GRU (h7
, W1
h6
+W2
h8
+W1
h9
)
h’3
= GRU (h3
, W1
h2
+W2
h4
)
W1
h2
W2
h4
h3
Graph convolution operation depends on bonding type of each atom pair
Update:
GRU: Gated Recurrent Unit
Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gated graph sequence neural networks.
arXiv preprint arXiv:1511.05493, 2015.

GGNN: Gated Graph Neural Network
C
C C
N C
C
C
O
OH
h1
h2
h3
h4
h5
h6
h7
h8
h9
h10
Readout operation contains selective operation (gating)
Readout:
R = ∑ v
σ (Wi
hv
) ⦿ Wj
hv
R = ∑ v
σ (i(hv
, hv0
)) ⦿ j(hv
)
Simplified version
Here, i and j represents some function (neural network)
σ is sigmoid non-linear function

Weave: Molecular Graph Convolutions
● Weave module convolutes an atom feature for by
features of the pair of each atoms.
A: atom feature, P: feature of atom pair
● P → A operation:
g() is a function for order invariance.
sum() is used in the paper.
Molecular Graph Convolutions: Moving Beyond Fingerprints
Steven Kearnes, Kevin McCloskey, Marc Berndl, Vijay Pande, Patrick Riley arXiv:1603.00856

SchNet: A continuous-filter convolutional neural network
Kristof Schütt, Pieter-Jan Kindermans, Huziel Enoc Sauceda Felix, Stefan Chmiela, Alexandre Tkatchenko, and Klaus-Rober Müller
Schnet: A continuous-filter convolutional neural network for modeling quantum interactions.
1. All atom pair distance ||ri
- rj
|| is used as input
2. Energy conserving condition can be addtionally used to constraint the model
for energy prediction task

Molecule generation with VAE [Gómez-Bombarelli+16]
● Encode and decode molecules
represented as SMILE with VAE in
seq2seq manner.
● Latent representation can be used for
semi-supervised learning.
● We can use learned model to find
molecule with desired property by
optimizing representation in latent
space and decode it.
Generated molecules are not guaranteed
to be valid syntactically :(
Gómez-Bombarelli, R., Wei, J. N., Duvenaud, D., Herná ndez-Lobato, J. M.,
Sánchez-Lengeling, B., Sheberla, D., ... & Aspuru-Guzik, A. (2016). Automatic chemical
design using a data-driven continuous representation of molecules. ACS Central Science.

Grammar VAE [Kusner+17]
Encode
Convert a molecule to a
parse tree to get a
sequence of production
rules and feed the
sequence to RNN-VAE.
Generated molecules are guaranteed to be valid syntactically !
Kusner, M. J., Paige, B., & Hernández-Lobato, J. M.
(2017). Grammar Variational Autoencoder. arXiv
preprint arXiv:1703.01925.
Decode
Generate sequence of
production rules of syntax
of SMILES represented by
CFG

Introduction to Chainer Chemistry

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to Chainer Chemistry

Similar to Introduction to Chainer Chemistry (20)

More from Preferred Networks

More from Preferred Networks (20)

Recently uploaded

Recently uploaded (20)

Introduction to Chainer Chemistry