2. Chemical prediction - Two approaches
Quantum simulation
● Theory-based approach
● DFT (Density Functional Theory)
◯ Theoretically guaranteed
△ High calculation cost
Machine learning
● Data-based approach
● Learn known compound’s property,
◯ Low cost, high speed calculation
△ No precision guaranteed
“Neural message passing for quantum chemistry” Justin et al
3. Data-based approach for chemical prediction
• Material informatics
– Material genome initiative
– MI2
I project (NIMS)
• Drug discovery
– Big pharmas’ investment
– IPAB drug discovery contest
6. Extended Connectivity Fingerprint (ECFP)
https://chembioinfo.com/2011/10/3
0/revisiting-molecular-hashed-fing
erprints/
https://docs.chemaxon.com/displa
y/docs/Extended+Connectivity+Fi
ngerprint+ECFP
Fixed length bit representation (fingerprint) of molecules.
7. Problems of conventional methods
1. Representation is not unique
– e.g. CC#C and C#CC are same molecule.
2. Order invariance is not guaranteed
– Representation is not guaranteed to be invariant under relabeling of atoms.
3. Rule-based approach
– Not adaptive to data
→ Graph convolution
9. Unified view of graph convolution
Many message-passing algorithms (NFP, GGNN, Weave etc.) are formulated as the
iterative application of Update and Readout functions [Gilmer et al. 17].
Update Readout
Aggregates neighborhood information and updates
node representations.
Aggregates all node representations and updates the
final output.
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. (2017). Neural message
passing for quantum chemistry. arXiv preprint arXiv:1704.01212.
10. NFP: Neural Fingerprint
C
C C
N C
C
C
O
O
h8
hnew
7
= σ (W3
(h7
+ h6
+ h8
+ h9
))
hnew
3
= σ (W2
(h3
+ h2
+
h4
))
h2
h4
h3
Update
h6
h7
h9
Readout
C
C C
N C
C
C
O
O
R = ∑ i
softmax (Whi
)
h2
h1
h3
h4
h5
h6
h7
h8
h9
11. Comparison between graph convolutional networks
NFP GGNN Weave SchNet
Atom feature
extraction
Man-made or
Embed
Man-made or
Embed
Man-made or
Embed
Man-made or
Embed
Graph convolution
strategy
Adjacent
atoms only
Adjacent
atoms only
All atom-atom
pairs
All atom-atom
pairs
How to represent
connection
information
Degree Binding type Man-made
pair features
(bondtype,
distance etc.)
Distance
12. How can we incorporate ML to Chemistry and
Biology?
Problems
• Optimized graph convolution algorithms are hard to implement from scratch.
• ML and Chemistry/Biology researchers sometimes use different “languages”.
Solution: Create tools so that …
• Chemistry/Biology researchers do not bother details of DL algorithms and
concentrate on their research.
• ML and Chemistry researchers can work in collaboration.
→ Chainer Chemistry
14. Example: HOMO Prediction by NFP with QM9 dataset
preprocessor = preprocess_method_dict['nfp']()
dataset = D.get_qm9(preprocessor, labels='homo')
# Cache dataset for second use
NumpyTupleDataset.save('input/nfp_homo/data.npz', dataset)
train_data_size = int(len(dataset) * train_data_ratio)
train, val = split_dataset_random(dataset, train_data_size)
Dataset preprocessing (for NFP Network)
15. Example: HOMO Prediction by NFP with QM9 dataset
class GraphConvPredictor(chainer.Chain):
def __init__(self, graph_conv, mlp):
super(GraphConvPredictor, self).__init__()
with self.init_scope():
self.graph_conv = graph_conv
self.mlp = mlp
def __call__(self, atoms, adjs):
x = self.graph_conv(atoms, adjs)
x = self.mlp(x)
return x
model = GraphConvPredictor(NFP(16, 16, 4), MLP(16, 1))
16. Development policy
• Designed for light users
– Main user is NOT informatician / computer engineers.
– High-level interface (e.g. sklearn-like API)
• Aggressive feature introduction and API improvements
– Current major version is 0, in the sense of semantic versioning v2.0.0
– At the cost of less strict compatibility policy than Chainer
– Sub branches for experimental features
17. v0.2.0 selected features (3/1/2018)
• Dataset Exploration of the Tox21 and QM9 dataset
• BalancedSerialIterator
– Chainer iterator that samples each label evenly
• ROCAUCEvaluator
– Chainer evaluator that calculates ROC-AUC
18. v0.3.0 selected features (4/25/2018)
• More examples
– How to use own datasets
– Inference code of regression task with QM9
• Classifier / Regressor
– sklearn-like API (predict, predict_proba)
– Serialization with pickle, with limited guarantee of model portability
• SparseRSGCN (expermental, merged in experimental_sparse)
– Sparse-operation version of RSGCN (graph convolution algorithm)
20. v0.4.0 planned roadmap
● More dataset support (e.g. MoleculeNet)
● More test coverage
● Performance comparison across several graph convolution models
● Pretrained models or inference web service?
21. Conclusion
• Data-based approach for chemical property prediction is gaining momentum.
• Chainer Chemistry is a Chainer extension library dedicated to Bio- and
Chemo- informatics.
• Chainer Chemistry implements several off-the-shelf graph convolutional
networks, which themselves are useful in other fields, too.
23. ● Learnable parameters:
● W (weight matrix of size N x M)
● b (bias vector of size M)
● Input : vector x of size N
● Output vector y = Wx + b (affine transformation)
W/b
Fully connected layer
yx
y1
yM
・・・・
x1
xN
・・・・・・
y = Wx + b
24. Activation function
● Function (usually) without learnable
parameter for introducing non-linearlity
● Input: vector (or tensor) x = (x1
, …, xn
)
● Output: vector (or tensor) y = (y1
, …, yn
)
y1
yN
x1
xN
yx
・・・・・・
Examples of σ
● Sigmoid(x) = 1 / 1 + exp(-x)
● tanh(x)
● ReLU(x) = max(0, x)
● LeakyReLU(x) = x (x > 0), ax (x < 0)
○ a < 0 is a fixed constant
・・・・・・
yi
= σ(xi
) (i = 1, …, n)
25. Convolutional Neural Network (CNN)[LeCun+98]
• A neural network consisting of convolutional layers and pooling layers
• Many variants: AlexNet, VGG, Inception, GoogleNet, ResNet etc.
• Widely used in image recognition and recently applied to biology and chemistry
LeNet-5[LeCunn+98]
LeCun, Yann, et al. "Gradient-based learning applied to
document recognition." Proceedings of the IEEE 86.11
(1998): 2278-2324.
32. Example: IT Drug Discovery Contest
Task
• Find new seed compounds for a target protein (Sirtuin 1) from 2.5 million
compounds by IT technologies
Rule
• Each team needs to prepare data by itself such as training datasets.
• Each team can submit up to 400 candidate compounds
• Judge checks all submitted compounds
by a 2-stage biological experiment.
– Thermal Shift Assay
– Inhibitory assay → IC50 measurement Sirtuin 1
Contest website (Japanese)
http://www.ipab.org/eventschedule/contest/contest4
33. Our result
Ours Average
(18 teams in total)
1st screening (TSA) 23 / 200 (11.5%) 69 / 3559 (1.9 %)
2nd screening (IC50) 1 5
We found one hit compound and won
one of Grand prize (IPAB prize)
35. NFP: Neural Fingerprint
C
C C
N C
C
C
O
OH
C
C C
N C
C
C
O
O
h1
h2
h3
h4
h5
h6
h7
h8
h9
h10
W3
h9
W3
h8
W3
h6
W3
h7 h’7
= σ ( W3
(h7
+h6
+h8
+h9
) )
h’3
= σ ( W2
(h3
+h2
+h4
) )
W2
h2
W2
h4
W2
h3
Graph convolution operation depends on degree of each atom
→ Bonding type information is not utilized
Update:
36. NFP: Neural Fingerprint
C
C C
N C
C
C
O
OH
h1
h2
h3
h4
h5
h6
h7
h8
h9
h10
Readout operation is basically simply sum over the atoms
→ No selective operation/attention mechanism is adopted.
Readout:
R = ∑ i
softmax (Whi
)
37. GGNN: Gated Graph Neural Network
C
C C
N C
C
C
O
OH
C
C C
N C
C
C
O
O
h1
h2
h3
h4
h5
h6
h7
h8
h9
h10
W1
h9
W2
h8
W1
h6
h7 h’7
= GRU (h7
, W1
h6
+W2
h8
+W1
h9
)
h’3
= GRU (h3
, W1
h2
+W2
h4
)
W1
h2
W2
h4
h3
Graph convolution operation depends on bonding type of each atom pair
Update:
GRU: Gated Recurrent Unit
Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gated graph sequence neural networks.
arXiv preprint arXiv:1511.05493, 2015.
38. GGNN: Gated Graph Neural Network
C
C C
N C
C
C
O
OH
h1
h2
h3
h4
h5
h6
h7
h8
h9
h10
Readout operation contains selective operation (gating)
Readout:
R = ∑ v
σ (Wi
hv
) ⦿ Wj
hv
R = ∑ v
σ (i(hv
, hv0
)) ⦿ j(hv
)
Simplified version
Here, i and j represents some function (neural network)
σ is sigmoid non-linear function
39. Weave: Molecular Graph Convolutions
● Weave module convolutes an atom feature for by
features of the pair of each atoms.
A: atom feature, P: feature of atom pair
● P → A operation:
g() is a function for order invariance.
sum() is used in the paper.
Molecular Graph Convolutions: Moving Beyond Fingerprints
Steven Kearnes, Kevin McCloskey, Marc Berndl, Vijay Pande, Patrick Riley arXiv:1603.00856
40. SchNet: A continuous-filter convolutional neural network
Kristof Schütt, Pieter-Jan Kindermans, Huziel Enoc Sauceda Felix, Stefan Chmiela, Alexandre Tkatchenko, and Klaus-Rober Müller
Schnet: A continuous-filter convolutional neural network for modeling quantum interactions.
1. All atom pair distance ||ri
- rj
|| is used as input
2. Energy conserving condition can be addtionally used to constraint the model
for energy prediction task
41. Molecule generation with VAE [Gómez-Bombarelli+16]
● Encode and decode molecules
represented as SMILE with VAE in
seq2seq manner.
● Latent representation can be used for
semi-supervised learning.
● We can use learned model to find
molecule with desired property by
optimizing representation in latent
space and decode it.
Generated molecules are not guaranteed
to be valid syntactically :(
Gómez-Bombarelli, R., Wei, J. N., Duvenaud, D., Herná ndez-Lobato, J. M.,
Sánchez-Lengeling, B., Sheberla, D., ... & Aspuru-Guzik, A. (2016). Automatic chemical
design using a data-driven continuous representation of molecules. ACS Central Science.
42. Grammar VAE [Kusner+17]
Encode
Convert a molecule to a
parse tree to get a
sequence of production
rules and feed the
sequence to RNN-VAE.
Generated molecules are guaranteed to be valid syntactically !
Kusner, M. J., Paige, B., & Hernández-Lobato, J. M.
(2017). Grammar Variational Autoencoder. arXiv
preprint arXiv:1703.01925.
Decode
Generate sequence of
production rules of syntax
of SMILES represented by
CFG