SlideShare a Scribd company logo
Materials Informatics Python

PyData 1Day Conference
2018 10 20
■
■ IT
■ 

■
■ 

■
■ twitter: @sfchaos @shifukushima

1
2
2000 2006 2009 2018
• •
• etc.
•
•
•
•
• etc.
2014
•
•
“ ”
■Materials Informatics
■
■
■Materials Informatics Python
3
1.
2. Materials Informatics
3. Python Materials Informatics
4.
5.
4
Materials Informatics
■
5
http://cms.mtl.kyoto-u.ac.jp/informatics.html
6
Materials Informatics
Materials Informatics
■2011 

Materials Genome Initiative 

https://www.mgi.gov/
■ 2015 NIMS 



http://www.nims.go.jp/MII-I/



NIMS:
7
■
■
■
■ 



(Li 2 )
■ TDK
■
■ NEC
■ 

etc.
8


https://tech.nikkeibp.co.jp/atcl/nxt/mag/ne/18/00030/00001/
■ NIPS 

9
NIPS2018 workshop
Machine Learning for Molecules
and Materials
http://www.quantum-
machine.org/workshops/
nips2018draft/
NIPS2017 workshop
Machine Learning for Molecules and
Materials
http://www.quantum-
machine.org/workshops/nips2017/
1.
2. Materials Informatics
3. Python Materials Informatics
4.
5.
10
SMILES
■
11
D.Weinberger et al., J.Chem.Inf.Model,28,31(1988)
: (C6H6)
c1ccccc1
: (C6H5COOH)
OC(=O)c1ccccc1
SMILES
■
12
D.Weinberger et al., J.Chem.Inf.Model,28,31(1988)
https://ja.wikipedia.org/wiki/SMILES%E8%A8%98%E6%B3%95
13
Y = f (X)
Y
X
S
Fingerprint
■
14
:
• D.Rogers and M.Hahn, J.Chem.Inf.Model.,50(5), 742(2010)

https://pubs.acs.org/doi/10.1021/ci100050t
• 94 

https://art.ist.hokudai.ac.jp/~takigawa/data/fpai94_takigawa.pdf
1.
2. Materials Informatics
3. Python Materials Informatics
4.
5.
15
RDKit
■
■ C++
■Python
16
https://www.rdkit.org/
RDKit
■Anaconda
17
$ conda install -c rdkit rdkit
Anaconda
https://www.rdkit.org/docs/Install.html
(variational autoencoder, VAE)
RDKit
■
■
■ 

Getting Started with the RDKit in Python

https://www.rdkit.org/docs/
GettingStartedInPython.html
■RDKit 

https://future-chem.com/rdkit-intro/
■ 

https://github.com/chemo-wakate
18
19
SMILES
Mol
■
20
■ 

21
■
22
■
23
https://www.rdkit.org/docs/GettingStartedInPython.html#list-of-
available-descriptors
■ :
24
■
25
1. 2. 3.
4. 



5.


Molecular neural network models with RDKit and Keras in Python
http://www.wildcardconsulting.dk/useful-information/molecular-neural-
network-models-with-rdkit-and-keras-in-python/
Keras
http://www.ag.kagawa-u.ac.jp/charlesy/2017/07/21/
keras%E3%81%A7%E5%8C%96%E5%90%88%E7%89%A9%E3%81%AE%E6%BA%B6%E8%A7%
A3%E5%BA%A6%E4%BA%88%E6%B8%AC%EF%BC%88%E3%83%8B%E3%83%A5%E3%83%BC
%E3%83%A9%E3%83%AB%E3%83%8D%E3%83%83%E3%83%88%E3%83%AF%E3%83%BC/
1. 

RDKit github
26
https://github.com/rdkit/rdkit/blob/
master/Docs/Book/data
solubility.train.sdf ( )
solubility.test.sdf ( )
※ SDF (Structured Data Format)
: SDF 

https://www.chem-station.com/blog/2012/04/sdf.html
1. 

SDF solubility.train.sdf )
27
n-pentane
RDKit 2D
5 4 0 0 0 0 0 0 0 0999 V2000
0.2606 0.1503 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.3000 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.6000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.9000 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
4.9394 0.1503 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0
2 3 1 0
3 4 1 0
4 5 1 0
M END
> <ID> (1)
1
> <NAME> (1)
n-pentane
2.
28
CH2CH2CH2CH2CH2
3.
29
4.
30
5.
31
5.
32
1.
2. Materials Informatics
3. Python Materials Informatics
4.
5.
33
■ Arxiv
■ 2
■
■
34
■ :
■ 

https://www.jstage.jst.go.jp/article/cicsj/36/1/36_9/
_pdf/-char/ja
■ Deep Learning 

https://kivantium.net/deep-for-chem
35
■
■ORGAN / ORGANIC
■MolGAN
■ChemTS
36
GAN
■
37
https://medium.com/@devnag/generative-adversarial-
networks-gans-in-50-lines-of-code-pytorch-e81b79659e3f
SeqGAN
38
L.Yu, et al., AAAI2017.
https://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/download/
14344/14489
https://github.com/LantaoYu/SeqGAN
(state): t-1
(action): t
ORGAN
■SeqGAN
39
s
d
,
d
-
h
d
o
e
)
.
t
,
Finally in SeqGAN the reward function is provided by D .
4 ORGAN
Figure 1: Schema for ORGAN. Left: D is trained as a classifier
receiving as input a mix of real data and generated data by G. Right:
G is trained by RL where the reward is a combination of D and the
objectives, and is passed back to the policy function via Monte Carlo
sampling. We penalize non-unique sequences.
Figure 1 illustrates the main idea of ORGAN. To take into
account domain-specific desired objectives Oi, we extend the
SeqGAN
SMILES
G.Guimaraes et al.(2017)
https://arxiv.org/abs/1705.10843
https://github.com/gablg1/ORGAN
ORGAN
40
Objective Algorithm Validity (%) Diversity Druglikeliness Synthesizability Solubility
MLE 75.9 0.64 0.48 (0%) 0.23 (0%) 0.30 (0%)
SeqGAN 80.3 0.61 0.49 (2%) 0.25 (6%) 0.31 (3%)
Druglikeliness ORGAN 88.2 0.55 0.52 (8%) 0.32 (38%) 0.35 (18%)
OR(W)GAN 85.0 0.95 0.60 (25%) 0.54 (130%) 0.47 (57%)
Naive RL 97.1 0.8 0.57 (19%) 0.53 (126%) 0.50 (67%)
Synthesizability ORGAN 96.5 0.92 0.51 (6%) 0.83 (255%) 0.45 (52%)
OR(W)GAN 97.6 1.00 0.20 (-59%) 0.75 (223%) 0.84 (184%)
Naive RL 97.7 0.96 0.52 (8%) 0.83 (256%) 0.46 (54%)
Solubility ORGAN 94.7 0.76 0.50 (4%) 0.63 (171%) 0.55 (85%)
OR(W)GAN 94.1 0.90 0.42 (-12%) 0.66 (185%) 0.54 (81%)
Naive RL 92.7 0.75 0.49 (3%) 0.70 (200%) 0.78 (162 %)
All/Alternated ORGAN 96.1 92.3 0.52 (9%) 0.71 (206%) 0.53 (79%)
ble 1: Evaluation of metrics, on several generative algorithms and optimized for different objectives for molecules. Reported values
an values of valid generated molecules. The percentage of improvement over the MLE baseline is reported in parenthesis. Values sho
bold indicate significant improvement. Shaded cell indicates direct optimized objectives.
ble 2 shows quantitative results comparing ORGAN to oth
baseline methods optimizing for three different metrics. O
GAN outperforms SeqGAN and MLE in all of the three m
rics. Naive RL achieves a higher score than ORGAN for
Ratio of Steps metric, but it under-performs in terms of
Druglikeliness, Synthesizability, Solubility
ORGANIC
■ORGAN
41
Methods
gure 1: Usage of ORGANIC illustrated. In the training procedure we show the thre
ndamental components: a generator, a discriminator, and a reinforcement metric. Arrow
icate the flow of inputs and outputs between networks.
B.S-.Lengeling, et al.(2017)
https://chemrxiv.org/articles/ORGANIC_1_pdf/5309668
https://github.com/aspuru-guzik-group/ORGANIC
MolGAN
■
■SMILES
42
ive model for small molecular graphs
Cao 1
Thomas Kipf 1
Molecular graph
Generator Discriminator
Reward
network
z ~ p(z)
0/1
0/1
x ~ pdata(x)
Generator Discriminator
N.D.Cao and T.Kipf(2018)
https://arxiv.org/abs/1805.11973
MolGAN
■
43
MolGAN: An implicit generative model for small molecular graphs
Generator
Graph
Molecule
N
N
N
N
N N
T T
z ~ p(z)
Adjacency tensor Sampled
SampledAnnotation matrix
~
~
GCN
GCN
0/1
0/1
Discriminator
Reward network
A<latexit sha1_base64="EMPyu5ASlEpI1qvrJeu1mckhUAU=">AAAB8XicbVDLSsNAFL3xWeur6tLNYBFclUSEuqy4cVnBPrANZTKdtEMnkzBzI5TQv3DjQhG3/o07/8ZJm4W2Hhg4nHMvc+4JEikMuu63s7a+sbm1Xdop7+7tHxxWjo7bJk414y0Wy1h3A2q4FIq3UKDk3URzGgWSd4LJbe53nrg2IlYPOE24H9GREqFgFK302I8ojoMwu5kNKlW35s5BVolXkCoUaA4qX/1hzNKIK2SSGtPz3AT9jGoUTPJZuZ8anlA2oSPes1TRiBs/myeekXOrDEkYa/sUkrn6eyOjkTHTKLCTeUKz7OXif14vxfDaz4RKUuSKLT4KU0kwJvn5ZCg0ZyinllCmhc1K2JhqytCWVLYleMsnr5L2Zc1za979VbVRL+oowSmcwQV4UIcG3EETWsBAwTO8wptjnBfn3flYjK45xc4J/IHz+QOmV5Da</latexit><latexit sha1_base64="EMPyu5ASlEpI1qvrJeu1mckhUAU=">AAAB8XicbVDLSsNAFL3xWeur6tLNYBFclUSEuqy4cVnBPrANZTKdtEMnkzBzI5TQv3DjQhG3/o07/8ZJm4W2Hhg4nHMvc+4JEikMuu63s7a+sbm1Xdop7+7tHxxWjo7bJk414y0Wy1h3A2q4FIq3UKDk3URzGgWSd4LJbe53nrg2IlYPOE24H9GREqFgFK302I8ojoMwu5kNKlW35s5BVolXkCoUaA4qX/1hzNKIK2SSGtPz3AT9jGoUTPJZuZ8anlA2oSPes1TRiBs/myeekXOrDEkYa/sUkrn6eyOjkTHTKLCTeUKz7OXif14vxfDaz4RKUuSKLT4KU0kwJvn5ZCg0ZyinllCmhc1K2JhqytCWVLYleMsnr5L2Zc1za979VbVRL+oowSmcwQV4UIcG3EETWsBAwTO8wptjnBfn3flYjK45xc4J/IHz+QOmV5Da</latexit><latexit sha1_base64="EMPyu5ASlEpI1qvrJeu1mckhUAU=">AAAB8XicbVDLSsNAFL3xWeur6tLNYBFclUSEuqy4cVnBPrANZTKdtEMnkzBzI5TQv3DjQhG3/o07/8ZJm4W2Hhg4nHMvc+4JEikMuu63s7a+sbm1Xdop7+7tHxxWjo7bJk414y0Wy1h3A2q4FIq3UKDk3URzGgWSd4LJbe53nrg2IlYPOE24H9GREqFgFK302I8ojoMwu5kNKlW35s5BVolXkCoUaA4qX/1hzNKIK2SSGtPz3AT9jGoUTPJZuZ8anlA2oSPes1TRiBs/myeekXOrDEkYa/sUkrn6eyOjkTHTKLCTeUKz7OXif14vxfDaz4RKUuSKLT4KU0kwJvn5ZCg0ZyinllCmhc1K2JhqytCWVLYleMsnr5L2Zc1za979VbVRL+oowSmcwQV4UIcG3EETWsBAwTO8wptjnBfn3flYjK45xc4J/IHz+QOmV5Da</latexit><latexit sha1_base64="EMPyu5ASlEpI1qvrJeu1mckhUAU=">AAAB8XicbVDLSsNAFL3xWeur6tLNYBFclUSEuqy4cVnBPrANZTKdtEMnkzBzI5TQv3DjQhG3/o07/8ZJm4W2Hhg4nHMvc+4JEikMuu63s7a+sbm1Xdop7+7tHxxWjo7bJk414y0Wy1h3A2q4FIq3UKDk3URzGgWSd4LJbe53nrg2IlYPOE24H9GREqFgFK302I8ojoMwu5kNKlW35s5BVolXkCoUaA4qX/1hzNKIK2SSGtPz3AT9jGoUTPJZuZ8anlA2oSPes1TRiBs/myeekXOrDEkYa/sUkrn6eyOjkTHTKLCTeUKz7OXif14vxfDaz4RKUuSKLT4KU0kwJvn5ZCg0ZyinllCmhc1K2JhqytCWVLYleMsnr5L2Zc1za979VbVRL+oowSmcwQV4UIcG3EETWsBAwTO8wptjnBfn3flYjK45xc4J/IHz+QOmV5Da</latexit>
X<latexit sha1_base64="k8fMTYMpbcAk1m6rTYMegJsdMOM=">AAAB8XicbVDLSsNAFL2pr1pfVZduBovgqiQi1GXBjcsK9oFtKJPppB06mYSZG6GE/oUbF4q49W/c+TdO2iy09cDA4Zx7mXNPkEhh0HW/ndLG5tb2Tnm3srd/cHhUPT7pmDjVjLdZLGPdC6jhUijeRoGS9xLNaRRI3g2mt7nffeLaiFg94CzhfkTHSoSCUbTS4yCiOAnCrDcfVmtu3V2ArBOvIDUo0BpWvwajmKURV8gkNabvuQn6GdUomOTzyiA1PKFsSse8b6miETd+tkg8JxdWGZEw1vYpJAv190ZGI2NmUWAn84Rm1cvF/7x+iuGNnwmVpMgVW34UppJgTPLzyUhozlDOLKFMC5uVsAnVlKEtqWJL8FZPXiedq7rn1r3761qzUdRRhjM4h0vwoAFNuIMWtIGBgmd4hTfHOC/Ou/OxHC05xc4p/IHz+QPJSpDx</latexit><latexit sha1_base64="k8fMTYMpbcAk1m6rTYMegJsdMOM=">AAAB8XicbVDLSsNAFL2pr1pfVZduBovgqiQi1GXBjcsK9oFtKJPppB06mYSZG6GE/oUbF4q49W/c+TdO2iy09cDA4Zx7mXNPkEhh0HW/ndLG5tb2Tnm3srd/cHhUPT7pmDjVjLdZLGPdC6jhUijeRoGS9xLNaRRI3g2mt7nffeLaiFg94CzhfkTHSoSCUbTS4yCiOAnCrDcfVmtu3V2ArBOvIDUo0BpWvwajmKURV8gkNabvuQn6GdUomOTzyiA1PKFsSse8b6miETd+tkg8JxdWGZEw1vYpJAv190ZGI2NmUWAn84Rm1cvF/7x+iuGNnwmVpMgVW34UppJgTPLzyUhozlDOLKFMC5uVsAnVlKEtqWJL8FZPXiedq7rn1r3761qzUdRRhjM4h0vwoAFNuIMWtIGBgmd4hTfHOC/Ou/OxHC05xc4p/IHz+QPJSpDx</latexit><latexit sha1_base64="k8fMTYMpbcAk1m6rTYMegJsdMOM=">AAAB8XicbVDLSsNAFL2pr1pfVZduBovgqiQi1GXBjcsK9oFtKJPppB06mYSZG6GE/oUbF4q49W/c+TdO2iy09cDA4Zx7mXNPkEhh0HW/ndLG5tb2Tnm3srd/cHhUPT7pmDjVjLdZLGPdC6jhUijeRoGS9xLNaRRI3g2mt7nffeLaiFg94CzhfkTHSoSCUbTS4yCiOAnCrDcfVmtu3V2ArBOvIDUo0BpWvwajmKURV8gkNabvuQn6GdUomOTzyiA1PKFsSse8b6miETd+tkg8JxdWGZEw1vYpJAv190ZGI2NmUWAn84Rm1cvF/7x+iuGNnwmVpMgVW34UppJgTPLzyUhozlDOLKFMC5uVsAnVlKEtqWJL8FZPXiedq7rn1r3761qzUdRRhjM4h0vwoAFNuIMWtIGBgmd4hTfHOC/Ou/OxHC05xc4p/IHz+QPJSpDx</latexit><latexit sha1_base64="k8fMTYMpbcAk1m6rTYMegJsdMOM=">AAAB8XicbVDLSsNAFL2pr1pfVZduBovgqiQi1GXBjcsK9oFtKJPppB06mYSZG6GE/oUbF4q49W/c+TdO2iy09cDA4Zx7mXNPkEhh0HW/ndLG5tb2Tnm3srd/cHhUPT7pmDjVjLdZLGPdC6jhUijeRoGS9xLNaRRI3g2mt7nffeLaiFg94CzhfkTHSoSCUbTS4yCiOAnCrDcfVmtu3V2ArBOvIDUo0BpWvwajmKURV8gkNabvuQn6GdUomOTzyiA1PKFsSse8b6miETd+tkg8JxdWGZEw1vYpJAv190ZGI2NmUWAn84Rm1cvF/7x+iuGNnwmVpMgVW34UppJgTPLzyUhozlDOLKFMC5uVsAnVlKEtqWJL8FZPXiedq7rn1r3761qzUdRRhjM4h0vwoAFNuIMWtIGBgmd4hTfHOC/Ou/OxHC05xc4p/IHz+QPJSpDx</latexit>
˜X<latexit sha1_base64="h5fkkvOPNqe9NI7w0SLn2N2FVmc=">AAAB+3icbVDLSsNAFL3xWesr1qWbwSK4KokIdVlw47KCfUATymQyaYdOJmFmIpaQX3HjQhG3/og7/8ZJm4W2Hhg4nHMv98wJUs6Udpxva2Nza3tnt7ZX3z84PDq2Txp9lWSS0B5JeCKHAVaUM0F7mmlOh6mkOA44HQSz29IfPFKpWCIe9DylfowngkWMYG2ksd3wYqynQZR7mvGQ5sOiGNtNp+UsgNaJW5EmVOiO7S8vTEgWU6EJx0qNXCfVfo6lZoTTou5liqaYzPCEjgwVOKbKzxfZC3RhlBBFiTRPaLRQf2/kOFZqHgdmskyqVr1S/M8bZTq68XMm0kxTQZaHoowjnaCyCBQySYnmc0MwkcxkRWSKJSba1FU3JbirX14n/auW67Tc++tmp13VUYMzOIdLcKENHbiDLvSAwBM8wyu8WYX1Yr1bH8vRDavaOYU/sD5/ALyelNg=</latexit><latexit sha1_base64="h5fkkvOPNqe9NI7w0SLn2N2FVmc=">AAAB+3icbVDLSsNAFL3xWesr1qWbwSK4KokIdVlw47KCfUATymQyaYdOJmFmIpaQX3HjQhG3/og7/8ZJm4W2Hhg4nHMv98wJUs6Udpxva2Nza3tnt7ZX3z84PDq2Txp9lWSS0B5JeCKHAVaUM0F7mmlOh6mkOA44HQSz29IfPFKpWCIe9DylfowngkWMYG2ksd3wYqynQZR7mvGQ5sOiGNtNp+UsgNaJW5EmVOiO7S8vTEgWU6EJx0qNXCfVfo6lZoTTou5liqaYzPCEjgwVOKbKzxfZC3RhlBBFiTRPaLRQf2/kOFZqHgdmskyqVr1S/M8bZTq68XMm0kxTQZaHoowjnaCyCBQySYnmc0MwkcxkRWSKJSba1FU3JbirX14n/auW67Tc++tmp13VUYMzOIdLcKENHbiDLvSAwBM8wyu8WYX1Yr1bH8vRDavaOYU/sD5/ALyelNg=</latexit><latexit sha1_base64="h5fkkvOPNqe9NI7w0SLn2N2FVmc=">AAAB+3icbVDLSsNAFL3xWesr1qWbwSK4KokIdVlw47KCfUATymQyaYdOJmFmIpaQX3HjQhG3/og7/8ZJm4W2Hhg4nHMv98wJUs6Udpxva2Nza3tnt7ZX3z84PDq2Txp9lWSS0B5JeCKHAVaUM0F7mmlOh6mkOA44HQSz29IfPFKpWCIe9DylfowngkWMYG2ksd3wYqynQZR7mvGQ5sOiGNtNp+UsgNaJW5EmVOiO7S8vTEgWU6EJx0qNXCfVfo6lZoTTou5liqaYzPCEjgwVOKbKzxfZC3RhlBBFiTRPaLRQf2/kOFZqHgdmskyqVr1S/M8bZTq68XMm0kxTQZaHoowjnaCyCBQySYnmc0MwkcxkRWSKJSba1FU3JbirX14n/auW67Tc++tmp13VUYMzOIdLcKENHbiDLvSAwBM8wyu8WYX1Yr1bH8vRDavaOYU/sD5/ALyelNg=</latexit><latexit sha1_base64="h5fkkvOPNqe9NI7w0SLn2N2FVmc=">AAAB+3icbVDLSsNAFL3xWesr1qWbwSK4KokIdVlw47KCfUATymQyaYdOJmFmIpaQX3HjQhG3/og7/8ZJm4W2Hhg4nHMv98wJUs6Udpxva2Nza3tnt7ZX3z84PDq2Txp9lWSS0B5JeCKHAVaUM0F7mmlOh6mkOA44HQSz29IfPFKpWCIe9DylfowngkWMYG2ksd3wYqynQZR7mvGQ5sOiGNtNp+UsgNaJW5EmVOiO7S8vTEgWU6EJx0qNXCfVfo6lZoTTou5liqaYzPCEjgwVOKbKzxfZC3RhlBBFiTRPaLRQf2/kOFZqHgdmskyqVr1S/M8bZTq68XMm0kxTQZaHoowjnaCyCBQySYnmc0MwkcxkRWSKJSba1FU3JbirX14n/auW67Tc++tmp13VUYMzOIdLcKENHbiDLvSAwBM8wyu8WYX1Yr1bH8vRDavaOYU/sD5/ALyelNg=</latexit>
˜A<latexit sha1_base64="IVJAEzjPjiXPvp4Oo4QNTUc/Kds=">AAAB+3icbVDLSsNAFL2pr1pftS7dDBbBVUlEqMuKG5cV7AOaUCaTSTt0MgkzE7GE/IobF4q49Ufc+TdO2iy09cDA4Zx7uWeOn3CmtG1/W5WNza3tnepubW//4PCoftzoqziVhPZIzGM59LGinAna00xzOkwkxZHP6cCf3Rb+4JFKxWLxoOcJ9SI8ESxkBGsjjesNN8J66oeZqxkPaHaT5+N6027ZC6B14pSkCSW64/qXG8QkjajQhGOlRo6daC/DUjPCaV5zU0UTTGZ4QkeGChxR5WWL7Dk6N0qAwliaJzRaqL83MhwpNY98M1kkVateIf7njVIdXnsZE0mqqSDLQ2HKkY5RUQQKmKRE87khmEhmsiIyxRITbeqqmRKc1S+vk/5ly7Fbzv1Vs9Mu66jCKZzBBTjQhg7cQRd6QOAJnuEV3qzcerHerY/laMUqd07gD6zPH5mUlME=</latexit><latexit sha1_base64="IVJAEzjPjiXPvp4Oo4QNTUc/Kds=">AAAB+3icbVDLSsNAFL2pr1pftS7dDBbBVUlEqMuKG5cV7AOaUCaTSTt0MgkzE7GE/IobF4q49Ufc+TdO2iy09cDA4Zx7uWeOn3CmtG1/W5WNza3tnepubW//4PCoftzoqziVhPZIzGM59LGinAna00xzOkwkxZHP6cCf3Rb+4JFKxWLxoOcJ9SI8ESxkBGsjjesNN8J66oeZqxkPaHaT5+N6027ZC6B14pSkCSW64/qXG8QkjajQhGOlRo6daC/DUjPCaV5zU0UTTGZ4QkeGChxR5WWL7Dk6N0qAwliaJzRaqL83MhwpNY98M1kkVateIf7njVIdXnsZE0mqqSDLQ2HKkY5RUQQKmKRE87khmEhmsiIyxRITbeqqmRKc1S+vk/5ly7Fbzv1Vs9Mu66jCKZzBBTjQhg7cQRd6QOAJnuEV3qzcerHerY/laMUqd07gD6zPH5mUlME=</latexit><latexit sha1_base64="IVJAEzjPjiXPvp4Oo4QNTUc/Kds=">AAAB+3icbVDLSsNAFL2pr1pftS7dDBbBVUlEqMuKG5cV7AOaUCaTSTt0MgkzE7GE/IobF4q49Ufc+TdO2iy09cDA4Zx7uWeOn3CmtG1/W5WNza3tnepubW//4PCoftzoqziVhPZIzGM59LGinAna00xzOkwkxZHP6cCf3Rb+4JFKxWLxoOcJ9SI8ESxkBGsjjesNN8J66oeZqxkPaHaT5+N6027ZC6B14pSkCSW64/qXG8QkjajQhGOlRo6daC/DUjPCaV5zU0UTTGZ4QkeGChxR5WWL7Dk6N0qAwliaJzRaqL83MhwpNY98M1kkVateIf7njVIdXnsZE0mqqSDLQ2HKkY5RUQQKmKRE87khmEhmsiIyxRITbeqqmRKc1S+vk/5ly7Fbzv1Vs9Mu66jCKZzBBTjQhg7cQRd6QOAJnuEV3qzcerHerY/laMUqd07gD6zPH5mUlME=</latexit><latexit sha1_base64="IVJAEzjPjiXPvp4Oo4QNTUc/Kds=">AAAB+3icbVDLSsNAFL2pr1pftS7dDBbBVUlEqMuKG5cV7AOaUCaTSTt0MgkzE7GE/IobF4q49Ufc+TdO2iy09cDA4Zx7uWeOn3CmtG1/W5WNza3tnepubW//4PCoftzoqziVhPZIzGM59LGinAna00xzOkwkxZHP6cCf3Rb+4JFKxWLxoOcJ9SI8ESxkBGsjjesNN8J66oeZqxkPaHaT5+N6027ZC6B14pSkCSW64/qXG8QkjajQhGOlRo6daC/DUjPCaV5zU0UTTGZ4QkeGChxR5WWL7Dk6N0qAwliaJzRaqL83MhwpNY98M1kkVateIf7njVIdXnsZE0mqqSDLQ2HKkY5RUQQKmKRE87khmEhmsiIyxRITbeqqmRKc1S+vk/5ly7Fbzv1Vs9Mu66jCKZzBBTjQhg7cQRd6QOAJnuEV3qzcerHerY/laMUqd07gD6zPH5mUlME=</latexit>
Figure 2. Outline of MolGAN. From left: the generator takes a sample from a prior distribution and generates a dense adjacency tensor
A and an annotation matrix X. Subsequently, sparse and discrete ˜A and ˜X are obtained from A and X respectively via categorical
sampling. The combination of ˜A and ˜X represents an annotated molecular graph which corresponds to a specific chemical compound.
Finally, the graph is processed by both the discriminator and reward networks that are invariant to node order permutations and based on
Relational-GCN (Schlichtkrull et al., 2017) layers.
loss and the RL loss: passing them to D and ˆR in order to make the gen-
eration stochastic while still forwarding continuous ob-
N.D.Cao and T.Kipf(2018)
https://arxiv.org/abs/1805.11973
SMILES
■ SMILES
44
Grammar Variational Autoencoder
O
OH
'c1ccccc1'
smiles
chain
...
chain
branched
atom
atom
aromatic
organic
'c'
ringbond
digit
'1'
branched
atom
smiles chain
chain
branched
atom
chain
branched
atom
3
atom, ringbond
branched
atom
aromatic
organic
atom
'c'
aromatic
organic
2
ringbond digit
digit '1'
4 5
form parse tree extract rules convert to 1-hot vectors
input SMILES
map to latent space
6
chain,
...
......
...
chain branched atom
smiles chain
chain chain, branched atom
atom, ringbondbranched atom
atombranched atom
aromatic organicatom
aliphatic organicatom
ringbond digit
digit '1'
'c'aromatic organic
'C'aliphatic organic
'N'aliphatic organic
digit '2'
1
SMILES grammar
Figure 1. The encoder of the GVAE. We denote the start rule in blue and all rules that decode to terminal in green. See text for details.
tion rules. We describe how the GVAE works using a sim-
ple example.
Encoding. Consider a subset of the SMILES grammar as
shown in Figure 1, box 1 . These are the possible pro-
duction rules that can be used for constructing a molecule.
Imagine we are given as input the SMILES string for ben-
zene: ‘c1ccccc1’. Figure 1, box 2 shows this molecule.
To encode this molecule into a continuous latent represen-
tation we begin by using the SMILES grammar to parse this
string into a parse tree (partially shown in box 3 ). This
tree describes how ‘c1ccccc1’ is generated by the grammar.
We decompose this tree into a sequence of production rules
by performing a pre-order traversal on the branches of the
parse tree going from left-to-right, shown in box 4 . We
convert these rules into 1-hot indicator vectors, where each
dimension corresponds to a rule in the SMILES grammar,
box 5 . Letting K denote the total number of production
timesteps (production rules) allowed by the decoder. We
will use these vectors in the rest of the decoder to select
production rules.
To ensure that any sequence of production rules generated
from the decoder is valid, we keep track of the state of
the parsing using a last-in first-out (LIFO) stack. This is
shown in Figure 2, box 3 . At the beginning, every valid
parse from the grammar must start with the start symbol:
smiles, which is placed on the stack. Next we pop off
whatever non-terminal symbol that was placed last on the
stack (in this case smiles), and we use it to mask out the
invalid dimensions of the logit vector. Formally, for ev-
ery non-terminal ↵ we define a fixed binary mask vector
m↵ 2 [0, 1]K
. This takes the value ‘1’ for all indices in
1, . . . , K corresponding to production rules that have ↵ on
their left-hand-side.
In this case the only production rule in the grammar begin-
ning with smiles is the first so we zero-out every dimension
M.J.Kusner, et al. ICML2017
http://proceedings.mlr.press/v70/kusner17a
https://github.com/mkusner/grammarVAE
(variational autoencoder, VAE)
Grammar Variational Autoencoder
map from latent space
1 2
...
convert to logits
maxlength
smiles
chain
chain,
branched
atom
branched
atom
branched
atom,
atom,
branched
atomringbond,
aromatic
organic,
branched
atomringbond,
branched
atom
ringbond,
stack mask out invalid rules
pop first
non-terminal
sample rule &
push non-terminals
onto stack
chainsmiles
chain
branched
atom
chain,
chain
branched
atom
chain
smiles
chain
branched
atom
atom, ringbond
branched
atom
atom
aromatic
organic
ringbond
digit
branched
atom
atom
aromatic
organic
'c'
aromatic
organic
ringbond digit
digit '1'digit,
...
......
3 4 5
concatenate
terminals
6 'c1ccccc1'
7
translate
molecule
Figure 2. The decoder of the GVAE. See text for details.
Algorithm 1 Sampling from the decoder
Input: Deterministic decoder output F 2 RTmax⇥K
,
masks m↵ for each production rule ↵
Output: Sampled productions X from p(X|z)
1: Initialize empty stack S, and push the start symbol S
onto the top; set t = 0
2: while S is nonempty do
3: Pop the last-pushed non-terminal ↵ from the stack S
4: Use Eq. (2) to sample a production rule R
5: Let xt be the 1-hot vector corresponding to R
character-based VAE decoder is that at every point in the
generated sequence, the character VAE can sample any
possible character. There is no stack or masking opera-
tion. The grammar VAE however is constrained to select
syntactically-valid sequences.
Syntactic vs. semantic validity. It is important to note
that the grammar encodes syntactically valid molecules
but not necessarily semantically valid molecules. This is
mainly because of three reasons. First, certain molecules
SMILES
45
B
C
N
O
S
P
F
I
H
Cl
Br
1
2
3
(
)
[
]
B
C
N
O
S
P
F
I
H
Cl
Br
1
2
3
(
)
[
]
B
C
N
O
S
P
F
I
H
Cl
Br
1
2
3
(
)
[
]
B
C
N
O
S
P
F
I
H
Cl
Br
1
2
3
(
)
[
]
C C 1
y(x1|w) y(x2|x<2, w)
B
C
N
O
S
P
F
I
H
Cl
Br
1
2
3
(
)
[
]
C
y(x3|x<3, w) y(x4|x<4, w) y(x5|x<5, w)
(x1) (x2) (x3) (x4)
RNN
cell
sequence
inputs:
Figure 1: The recurren
imate the Q-function.
function activation is
acter in C. Here the
SMILES alphabet and
acters of the molecule
example. The initial
from the first hidden
continues until the en
during decoding, but its performance achieved by this method leaves scope fo
method requires hand-crafted grammatical rules for each application domain
In this paper, we propose a generative approach to modeling validity that
constraints of a given discrete space. We show how concepts from reinforce
used to define a suitable generative model and how this model can be approx
D.Janz, et al. ICLR2018
https://arxiv.org/abs/1712.01664
https://github.com/DavidJanz/molecule_grammar_rnn
LSTM
■ 

■ 

=
46
■ AlphaGO
■
47
ARTICLE RESEARCH
and the first-degree neighbouring atoms. Only rules that occurred at
least 50 times in reactions published before 2015 were kept. For the
Prediction with the in-scope filter network
After the search space has been narrowed down by the expansion policy
Search tree representationChemical representation of the synthesis plana b
B
E
A
F
C D
A= {1} B= {2,6} C= {3,6}
D= {4,5,6} E= {8,9} F= {6,7,8}
Root (target)
Target
Terminal
solved state
N
O
CO2 Me
CO2Me
Boc
Ph
HN
O
CO2Me
CO2Me
Ph
MeCO2
MeO2C
1
2
3 5
4
6
7
9
8
N
Boc
Ph
OH
N
Boc
Ph
OTBS
HN
Ph
OH
N
H
Boc OTBS
Ph Br
+
+
+
+Boc2O
8
Boc2O
Figure 1 | Translation of the traditional chemists’ retrosynthetic route
representation to the search tree representation. a, The traditional
chemists’ retrosynthetic route representation (conditions omitted)50
.
b, The search tree representation. The nodes in the tree represent the
synthetic position, and contain all precursors needed to make the
molecules of the preceding positions all the way down to the tree’s
root, which contains the target. Branches in the search tree correspond
to complete routes. Calculating the value of branches through task-
dependent scoring functions allows us to compare and rank different
routes. The target molecule can be solved if it can be deconstructed to a
set of readily available building blocks (marked red). Ph, phenyl; Boc,
tert-butyloxycarbonyl; TBS, tert-butyldimethylsilyl.
M.H.S.Segler, et al. Nature 555(2018)
https://www.nature.com/articles/nature25978
48
ARTICLERESEARCH
(1) Selection (2) Expansion (3) Rollout
Pick and evaluate
new position
Incorporate evaluation
in the search tree
Pick most
promising position
Retroanalyse, add new nodes to
tree by expansion procedure (see b)
(4) Update
δQ
δQ
δQ
δ
Invariant
encoding
Expansion policy:
prioritizes
transformations
Keep the k best
transformations and
apply them to
the target
Keep likely
reactions
For each reaction use
in-scope filter
Target
molecule
A
A
Synthesis planning with Monte Carlo tree search
Expansion procedureb
a
A
B
B
C
C
Ranked precursor
molecule positions
T1
T2
.
.
.
Tn
R1
R2
.
.
Rk
ECFP4
Symbolic Neural Neural SymbolicSymbolic
Figure 2 | Schematic of MCTS methodology. a, MCTS searches by
iterating over four phases. In the selection phase (1), the most urgent
node for analysis is chosen on the basis of the current position values.
In phase (2) this node may be expanded by processing the molecules of
the position A with the expansion procedure (b), which leads to new
positions B and C, which are added to the tree. Then, the most promising
new position is chosen, and a rollout phase (3) is performed by randomly
sampling transformations from the rollout policy until all molecules
are solved or a certain depth is exceeded. In the update phase (4), the
position values are updated in the current branch to reflect the result of the
rollout. b, Expansion procedure. First, the molecule (A) to retroanalyse is
converted to a fingerprint and fed into the policy network, which returns a
probability distribution over all possible transformations (T1 to Tn). Then,
only the k most probable transformations are applied to molecule A. This
yields the reactants necessary to make A, and thus complete reactions R1
to Rk. For each reaction, the reaction prediction is performed using the
in-scope filter, returning a probablity score. Improbable reactions are then
filtered out, which leads to the list of admissible actions and corresponding
precursor positions B and C.
M.H.S.Segler, et al. Nature 555(7678), 604 (2018)
https://www.nature.com/articles/nature25978
Sequence-to-Sequence
49
del. Seq2seq Model. Neural sequence-to-sequence
eq) models map one sequence to another and have
y shown state of the art performance in many tasks such
hine translation.49,50
It is based on an encoder−decoder
cture that consists of two recurrent neural networks
sequence log probability at each time step during decodi
retained, where N is the width of the beam. The decod
stopped once the lengths of the candidate sequences rea
maximum decode length of 140 characters. The can
sequences that contain an end of sequence charact
considered to be complete. On average, about 97% of all
3. Seq2seq model architecture.
DOI: 10.1021/acscentsc
ACS Cent. Sci. 2017, 3, 11
1105
SMILES
SMILES(SMART)
B.Liu, et al. ACS. Cent. Sci. 3(10), 1103(2017)
https://pubs.acs.org/doi/full/10.1021/acscentsci.7b00303
https://github.com/pandegroup/reaction_prediction_seq2seq
■Coley et al. (2017)
50
tension of the one-step strategy to multistep pathway planning is
.
characters (i.e., a product SMILES26
string without atom
C.W. Coley et al. ACS. Cent. Sci. 3(12), 1237 (2017)
https://pubs.acs.org/doi/full/10.1021/acscentsci.7b00355
https://github.com/connorcoley/retrosim
■Coley et al.
51


2018 9
http://www.molsci.jp/2018/pdf/4E13_w.pdf
Coming soon…
2018 10
(IBIS) 2018 11
1.
2. Materials Informatics
3. Python Materials Informatics
4.
5.
52
■Materials Informatics 

■
■Materials Informatics Python
53
■Materials Informatics
■ 

https://www.jstage.jst.go.jp/article/ciqs/2017/0/2017_PL/
_pdf/-char/ja
■ 

https://www.jstage.jst.go.jp/article/cicsj/36/1/36_9/_pdf/-
char/ja
■ 

https://www.ssken.gr.jp/MAINSITE/event/2017/20171026-
sci/lecture-01/
SSKEN_sci2017_YoshidaRyo_presentation.pdf
54
■RDKit
■ https://www.rdkit.org/
■RDKit 

https://future-chem.com/rdkit-intro/
■ 

https://github.com/chemo-wakate
■RDKit http://rdkit-users.jp/
55

More Related Content

What's hot

ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learningゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
Preferred Networks
 
[DL輪読会]Reward Augmented Maximum Likelihood for Neural Structured Prediction
[DL輪読会]Reward Augmented Maximum Likelihood for Neural Structured Prediction[DL輪読会]Reward Augmented Maximum Likelihood for Neural Structured Prediction
[DL輪読会]Reward Augmented Maximum Likelihood for Neural Structured Prediction
Deep Learning JP
 
画像認識モデルを作るための鉄板レシピ
画像認識モデルを作るための鉄板レシピ画像認識モデルを作るための鉄板レシピ
画像認識モデルを作るための鉄板レシピ
Takahiro Kubo
 
幾何と機械学習: A Short Intro
幾何と機械学習: A Short Intro幾何と機械学習: A Short Intro
幾何と機械学習: A Short Intro
Ichigaku Takigawa
 
ChatGPTは思ったほど賢くない
ChatGPTは思ったほど賢くないChatGPTは思ったほど賢くない
ChatGPTは思ったほど賢くない
Carnot Inc.
 
R-CNNの原理とここ数年の流れ
R-CNNの原理とここ数年の流れR-CNNの原理とここ数年の流れ
R-CNNの原理とここ数年の流れ
Kazuki Motohashi
 
分散深層学習 @ NIPS'17
分散深層学習 @ NIPS'17分散深層学習 @ NIPS'17
分散深層学習 @ NIPS'17
Takuya Akiba
 
Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)
Yoshitaka Ushiku
 
Depth Estimation論文紹介
Depth Estimation論文紹介Depth Estimation論文紹介
Depth Estimation論文紹介
Keio Robotics Association
 
[DL輪読会]Pay Attention to MLPs (gMLP)
[DL輪読会]Pay Attention to MLPs	(gMLP)[DL輪読会]Pay Attention to MLPs	(gMLP)
[DL輪読会]Pay Attention to MLPs (gMLP)
Deep Learning JP
 
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
mlm_kansai
 
最適輸送の解き方
最適輸送の解き方最適輸送の解き方
最適輸送の解き方
joisino
 
ベータ分布の謎に迫る
ベータ分布の謎に迫るベータ分布の謎に迫る
ベータ分布の謎に迫る
Ken'ichi Matsui
 
ブレインパッドにおける機械学習プロジェクトの進め方
ブレインパッドにおける機械学習プロジェクトの進め方ブレインパッドにおける機械学習プロジェクトの進め方
ブレインパッドにおける機械学習プロジェクトの進め方
BrainPad Inc.
 
SHAP値の考え方を理解する(木構造編)
SHAP値の考え方を理解する(木構造編)SHAP値の考え方を理解する(木構造編)
SHAP値の考え方を理解する(木構造編)
Kazuyuki Wakasugi
 
Newman アルゴリズムによるソーシャルグラフのクラスタリング
Newman アルゴリズムによるソーシャルグラフのクラスタリングNewman アルゴリズムによるソーシャルグラフのクラスタリング
Newman アルゴリズムによるソーシャルグラフのクラスタリング
Atsushi KOMIYA
 
機械学習モデルの判断根拠の説明
機械学習モデルの判断根拠の説明機械学習モデルの判断根拠の説明
機械学習モデルの判断根拠の説明
Satoshi Hara
 
Tensor コアを使った PyTorch の高速化
Tensor コアを使った PyTorch の高速化Tensor コアを使った PyTorch の高速化
Tensor コアを使った PyTorch の高速化
Yusuke Fujimoto
 
畳み込みニューラルネットワークの研究動向
畳み込みニューラルネットワークの研究動向畳み込みニューラルネットワークの研究動向
畳み込みニューラルネットワークの研究動向
Yusuke Uchida
 
深層強化学習と実装例
深層強化学習と実装例深層強化学習と実装例

What's hot (20)

ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learningゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
 
[DL輪読会]Reward Augmented Maximum Likelihood for Neural Structured Prediction
[DL輪読会]Reward Augmented Maximum Likelihood for Neural Structured Prediction[DL輪読会]Reward Augmented Maximum Likelihood for Neural Structured Prediction
[DL輪読会]Reward Augmented Maximum Likelihood for Neural Structured Prediction
 
画像認識モデルを作るための鉄板レシピ
画像認識モデルを作るための鉄板レシピ画像認識モデルを作るための鉄板レシピ
画像認識モデルを作るための鉄板レシピ
 
幾何と機械学習: A Short Intro
幾何と機械学習: A Short Intro幾何と機械学習: A Short Intro
幾何と機械学習: A Short Intro
 
ChatGPTは思ったほど賢くない
ChatGPTは思ったほど賢くないChatGPTは思ったほど賢くない
ChatGPTは思ったほど賢くない
 
R-CNNの原理とここ数年の流れ
R-CNNの原理とここ数年の流れR-CNNの原理とここ数年の流れ
R-CNNの原理とここ数年の流れ
 
分散深層学習 @ NIPS'17
分散深層学習 @ NIPS'17分散深層学習 @ NIPS'17
分散深層学習 @ NIPS'17
 
Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)
 
Depth Estimation論文紹介
Depth Estimation論文紹介Depth Estimation論文紹介
Depth Estimation論文紹介
 
[DL輪読会]Pay Attention to MLPs (gMLP)
[DL輪読会]Pay Attention to MLPs	(gMLP)[DL輪読会]Pay Attention to MLPs	(gMLP)
[DL輪読会]Pay Attention to MLPs (gMLP)
 
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
 
最適輸送の解き方
最適輸送の解き方最適輸送の解き方
最適輸送の解き方
 
ベータ分布の謎に迫る
ベータ分布の謎に迫るベータ分布の謎に迫る
ベータ分布の謎に迫る
 
ブレインパッドにおける機械学習プロジェクトの進め方
ブレインパッドにおける機械学習プロジェクトの進め方ブレインパッドにおける機械学習プロジェクトの進め方
ブレインパッドにおける機械学習プロジェクトの進め方
 
SHAP値の考え方を理解する(木構造編)
SHAP値の考え方を理解する(木構造編)SHAP値の考え方を理解する(木構造編)
SHAP値の考え方を理解する(木構造編)
 
Newman アルゴリズムによるソーシャルグラフのクラスタリング
Newman アルゴリズムによるソーシャルグラフのクラスタリングNewman アルゴリズムによるソーシャルグラフのクラスタリング
Newman アルゴリズムによるソーシャルグラフのクラスタリング
 
機械学習モデルの判断根拠の説明
機械学習モデルの判断根拠の説明機械学習モデルの判断根拠の説明
機械学習モデルの判断根拠の説明
 
Tensor コアを使った PyTorch の高速化
Tensor コアを使った PyTorch の高速化Tensor コアを使った PyTorch の高速化
Tensor コアを使った PyTorch の高速化
 
畳み込みニューラルネットワークの研究動向
畳み込みニューラルネットワークの研究動向畳み込みニューラルネットワークの研究動向
畳み込みニューラルネットワークの研究動向
 
深層強化学習と実装例
深層強化学習と実装例深層強化学習と実装例
深層強化学習と実装例
 

Similar to Materials Informatics and Python

The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
Kamel Mansouri
 
Machine Learning for Molecules
Machine Learning for MoleculesMachine Learning for Molecules
Machine Learning for Molecules
Ichigaku Takigawa
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
Philip Cheung
 
Benchmarking Commercial RDF Stores with Publications Office Dataset
Benchmarking Commercial RDF Stores with Publications Office DatasetBenchmarking Commercial RDF Stores with Publications Office Dataset
Benchmarking Commercial RDF Stores with Publications Office Dataset
Ghislain Atemezing
 
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Keiichiro Ono
 
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
台灣資料科學年會
 
Nature Inspired Metaheuristic Algorithms
Nature Inspired Metaheuristic AlgorithmsNature Inspired Metaheuristic Algorithms
Nature Inspired Metaheuristic Algorithms
IRJET Journal
 
Too good to be true? How validate your data
Too good to be true? How validate your dataToo good to be true? How validate your data
Too good to be true? How validate your data
Alex Henderson
 
May 15 workshop
May 15  workshopMay 15  workshop
May 15 workshop
Fahadahammed2
 
May workshop
May workshopMay workshop
May workshop
Fahadahammed2
 
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Chris Hammerschmidt
 
人工知能の基本問題:これまでとこれから
人工知能の基本問題:これまでとこれから人工知能の基本問題:これまでとこれから
人工知能の基本問題:これまでとこれから
Ichigaku Takigawa
 
A Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmA Firefly based improved clustering algorithm
A Firefly based improved clustering algorithm
IRJET Journal
 
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline
Eman Abdelrazik
 
Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
คอม.ระบบสารสนเทศ
คอม.ระบบสารสนเทศคอม.ระบบสารสนเทศ
คอม.ระบบสารสนเทศiibeamcha
 
A New Model for Informed Consent: The Impact of Open Science on the Responsib...
A New Model for Informed Consent: The Impact of Open Science on the Responsib...A New Model for Informed Consent: The Impact of Open Science on the Responsib...
A New Model for Informed Consent: The Impact of Open Science on the Responsib...
john wilbanks
 
Automating fetal heart monitor using machine learning
Automating fetal heart monitor using machine learningAutomating fetal heart monitor using machine learning
Automating fetal heart monitor using machine learning
Tamjid Rayhan
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data Extraction
Dasha Herrmannova
 
Pycon
PyconPycon

Similar to Materials Informatics and Python (20)

The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
 
Machine Learning for Molecules
Machine Learning for MoleculesMachine Learning for Molecules
Machine Learning for Molecules
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
 
Benchmarking Commercial RDF Stores with Publications Office Dataset
Benchmarking Commercial RDF Stores with Publications Office DatasetBenchmarking Commercial RDF Stores with Publications Office Dataset
Benchmarking Commercial RDF Stores with Publications Office Dataset
 
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
 
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
 
Nature Inspired Metaheuristic Algorithms
Nature Inspired Metaheuristic AlgorithmsNature Inspired Metaheuristic Algorithms
Nature Inspired Metaheuristic Algorithms
 
Too good to be true? How validate your data
Too good to be true? How validate your dataToo good to be true? How validate your data
Too good to be true? How validate your data
 
May 15 workshop
May 15  workshopMay 15  workshop
May 15 workshop
 
May workshop
May workshopMay workshop
May workshop
 
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
 
人工知能の基本問題:これまでとこれから
人工知能の基本問題:これまでとこれから人工知能の基本問題:これまでとこれから
人工知能の基本問題:これまでとこれから
 
A Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmA Firefly based improved clustering algorithm
A Firefly based improved clustering algorithm
 
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline
 
Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...
 
คอม.ระบบสารสนเทศ
คอม.ระบบสารสนเทศคอม.ระบบสารสนเทศ
คอม.ระบบสารสนเทศ
 
A New Model for Informed Consent: The Impact of Open Science on the Responsib...
A New Model for Informed Consent: The Impact of Open Science on the Responsib...A New Model for Informed Consent: The Impact of Open Science on the Responsib...
A New Model for Informed Consent: The Impact of Open Science on the Responsib...
 
Automating fetal heart monitor using machine learning
Automating fetal heart monitor using machine learningAutomating fetal heart monitor using machine learning
Automating fetal heart monitor using machine learning
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data Extraction
 
Pycon
PyconPycon
Pycon
 

More from Shintaro Fukushima

20230216_Python機械学習プログラミング.pdf
20230216_Python機械学習プログラミング.pdf20230216_Python機械学習プログラミング.pdf
20230216_Python機械学習プログラミング.pdf
Shintaro Fukushima
 
機械学習品質管理・保証の動向と取り組み
機械学習品質管理・保証の動向と取り組み機械学習品質管理・保証の動向と取り組み
機械学習品質管理・保証の動向と取り組み
Shintaro Fukushima
 
BPstudy sklearn 20180925
BPstudy sklearn 20180925BPstudy sklearn 20180925
BPstudy sklearn 20180925
Shintaro Fukushima
 
最近のRのランダムフォレストパッケージ -ranger/Rborist-
最近のRのランダムフォレストパッケージ -ranger/Rborist-最近のRのランダムフォレストパッケージ -ranger/Rborist-
最近のRのランダムフォレストパッケージ -ranger/Rborist-
Shintaro Fukushima
 
Why dont you_create_new_spark_jl
Why dont you_create_new_spark_jlWhy dont you_create_new_spark_jl
Why dont you_create_new_spark_jlShintaro Fukushima
 
Rユーザのためのspark入門
Rユーザのためのspark入門Rユーザのためのspark入門
Rユーザのためのspark入門Shintaro Fukushima
 
Juliaによる予測モデル構築・評価
Juliaによる予測モデル構築・評価Juliaによる予測モデル構築・評価
Juliaによる予測モデル構築・評価Shintaro Fukushima
 
機械学習を用いた予測モデル構築・評価
機械学習を用いた予測モデル構築・評価機械学習を用いた予測モデル構築・評価
機械学習を用いた予測モデル構築・評価Shintaro Fukushima
 
データサイエンスワールドからC++を眺めてみる
データサイエンスワールドからC++を眺めてみるデータサイエンスワールドからC++を眺めてみる
データサイエンスワールドからC++を眺めてみるShintaro Fukushima
 
data.tableパッケージで大規模データをサクッと処理する
data.tableパッケージで大規模データをサクッと処理するdata.tableパッケージで大規模データをサクッと処理する
data.tableパッケージで大規模データをサクッと処理するShintaro Fukushima
 
アクションマイニングを用いた最適なアクションの導出
アクションマイニングを用いた最適なアクションの導出アクションマイニングを用いた最適なアクションの導出
アクションマイニングを用いた最適なアクションの導出Shintaro Fukushima
 
統計解析言語Rにおける大規模データ管理のためのboost.interprocessの活用
統計解析言語Rにおける大規模データ管理のためのboost.interprocessの活用統計解析言語Rにおける大規模データ管理のためのboost.interprocessの活用
統計解析言語Rにおける大規模データ管理のためのboost.interprocessの活用Shintaro Fukushima
 
不均衡データのクラス分類
不均衡データのクラス分類不均衡データのクラス分類
不均衡データのクラス分類Shintaro Fukushima
 
mmapパッケージを使ってお手軽オブジェクト管理
mmapパッケージを使ってお手軽オブジェクト管理mmapパッケージを使ってお手軽オブジェクト管理
mmapパッケージを使ってお手軽オブジェクト管理Shintaro Fukushima
 
Numpy scipyで独立成分分析
Numpy scipyで独立成分分析Numpy scipyで独立成分分析
Numpy scipyで独立成分分析Shintaro Fukushima
 

More from Shintaro Fukushima (20)

20230216_Python機械学習プログラミング.pdf
20230216_Python機械学習プログラミング.pdf20230216_Python機械学習プログラミング.pdf
20230216_Python機械学習プログラミング.pdf
 
機械学習品質管理・保証の動向と取り組み
機械学習品質管理・保証の動向と取り組み機械学習品質管理・保証の動向と取り組み
機械学習品質管理・保証の動向と取り組み
 
BPstudy sklearn 20180925
BPstudy sklearn 20180925BPstudy sklearn 20180925
BPstudy sklearn 20180925
 
最近のRのランダムフォレストパッケージ -ranger/Rborist-
最近のRのランダムフォレストパッケージ -ranger/Rborist-最近のRのランダムフォレストパッケージ -ranger/Rborist-
最近のRのランダムフォレストパッケージ -ranger/Rborist-
 
Why dont you_create_new_spark_jl
Why dont you_create_new_spark_jlWhy dont you_create_new_spark_jl
Why dont you_create_new_spark_jl
 
Rユーザのためのspark入門
Rユーザのためのspark入門Rユーザのためのspark入門
Rユーザのためのspark入門
 
Juliaによる予測モデル構築・評価
Juliaによる予測モデル構築・評価Juliaによる予測モデル構築・評価
Juliaによる予測モデル構築・評価
 
Juliaで並列計算
Juliaで並列計算Juliaで並列計算
Juliaで並列計算
 
機械学習を用いた予測モデル構築・評価
機械学習を用いた予測モデル構築・評価機械学習を用いた予測モデル構築・評価
機械学習を用いた予測モデル構築・評価
 
データサイエンスワールドからC++を眺めてみる
データサイエンスワールドからC++を眺めてみるデータサイエンスワールドからC++を眺めてみる
データサイエンスワールドからC++を眺めてみる
 
data.tableパッケージで大規模データをサクッと処理する
data.tableパッケージで大規模データをサクッと処理するdata.tableパッケージで大規模データをサクッと処理する
data.tableパッケージで大規模データをサクッと処理する
 
アクションマイニングを用いた最適なアクションの導出
アクションマイニングを用いた最適なアクションの導出アクションマイニングを用いた最適なアクションの導出
アクションマイニングを用いた最適なアクションの導出
 
R3.0.0 is relased
R3.0.0 is relasedR3.0.0 is relased
R3.0.0 is relased
 
外れ値
外れ値外れ値
外れ値
 
Rでreproducible research
Rでreproducible researchRでreproducible research
Rでreproducible research
 
統計解析言語Rにおける大規模データ管理のためのboost.interprocessの活用
統計解析言語Rにおける大規模データ管理のためのboost.interprocessの活用統計解析言語Rにおける大規模データ管理のためのboost.interprocessの活用
統計解析言語Rにおける大規模データ管理のためのboost.interprocessの活用
 
不均衡データのクラス分類
不均衡データのクラス分類不均衡データのクラス分類
不均衡データのクラス分類
 
mmapパッケージを使ってお手軽オブジェクト管理
mmapパッケージを使ってお手軽オブジェクト管理mmapパッケージを使ってお手軽オブジェクト管理
mmapパッケージを使ってお手軽オブジェクト管理
 
Numpy scipyで独立成分分析
Numpy scipyで独立成分分析Numpy scipyで独立成分分析
Numpy scipyで独立成分分析
 
Rで学ぶロバスト推定
Rで学ぶロバスト推定Rで学ぶロバスト推定
Rで学ぶロバスト推定
 

Recently uploaded

Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 

Materials Informatics and Python

  • 1. Materials Informatics Python
 PyData 1Day Conference 2018 10 20
  • 2. ■ ■ IT ■ 
 ■ ■ 
 ■ ■ twitter: @sfchaos @shifukushima
 1
  • 3. 2 2000 2006 2009 2018 • • • etc. • • • • • etc. 2014 • • “ ”
  • 5. 1. 2. Materials Informatics 3. Python Materials Informatics 4. 5. 4
  • 8. Materials Informatics ■2011 
 Materials Genome Initiative 
 https://www.mgi.gov/ ■ 2015 NIMS 
 
 http://www.nims.go.jp/MII-I/
 
 NIMS: 7
  • 9. ■ ■ ■ ■ 
 
 (Li 2 ) ■ TDK ■ ■ NEC ■ 
 etc. 8 
 https://tech.nikkeibp.co.jp/atcl/nxt/mag/ne/18/00030/00001/
  • 10. ■ NIPS 
 9 NIPS2018 workshop Machine Learning for Molecules and Materials http://www.quantum- machine.org/workshops/ nips2018draft/ NIPS2017 workshop Machine Learning for Molecules and Materials http://www.quantum- machine.org/workshops/nips2017/
  • 11. 1. 2. Materials Informatics 3. Python Materials Informatics 4. 5. 10
  • 12. SMILES ■ 11 D.Weinberger et al., J.Chem.Inf.Model,28,31(1988) : (C6H6) c1ccccc1 : (C6H5COOH) OC(=O)c1ccccc1
  • 13. SMILES ■ 12 D.Weinberger et al., J.Chem.Inf.Model,28,31(1988) https://ja.wikipedia.org/wiki/SMILES%E8%A8%98%E6%B3%95
  • 14. 13 Y = f (X) Y X S
  • 15. Fingerprint ■ 14 : • D.Rogers and M.Hahn, J.Chem.Inf.Model.,50(5), 742(2010)
 https://pubs.acs.org/doi/10.1021/ci100050t • 94 
 https://art.ist.hokudai.ac.jp/~takigawa/data/fpai94_takigawa.pdf
  • 16. 1. 2. Materials Informatics 3. Python Materials Informatics 4. 5. 15
  • 18. RDKit ■Anaconda 17 $ conda install -c rdkit rdkit Anaconda https://www.rdkit.org/docs/Install.html (variational autoencoder, VAE)
  • 19. RDKit ■ ■ ■ 
 Getting Started with the RDKit in Python
 https://www.rdkit.org/docs/ GettingStartedInPython.html ■RDKit 
 https://future-chem.com/rdkit-intro/ ■ 
 https://github.com/chemo-wakate 18
  • 26. ■ 25 1. 2. 3. 4. 
 
 5. 
 Molecular neural network models with RDKit and Keras in Python http://www.wildcardconsulting.dk/useful-information/molecular-neural- network-models-with-rdkit-and-keras-in-python/ Keras http://www.ag.kagawa-u.ac.jp/charlesy/2017/07/21/ keras%E3%81%A7%E5%8C%96%E5%90%88%E7%89%A9%E3%81%AE%E6%BA%B6%E8%A7% A3%E5%BA%A6%E4%BA%88%E6%B8%AC%EF%BC%88%E3%83%8B%E3%83%A5%E3%83%BC %E3%83%A9%E3%83%AB%E3%83%8D%E3%83%83%E3%83%88%E3%83%AF%E3%83%BC/
  • 27. 1. 
 RDKit github 26 https://github.com/rdkit/rdkit/blob/ master/Docs/Book/data solubility.train.sdf ( ) solubility.test.sdf ( ) ※ SDF (Structured Data Format) : SDF 
 https://www.chem-station.com/blog/2012/04/sdf.html
  • 28. 1. 
 SDF solubility.train.sdf ) 27 n-pentane RDKit 2D 5 4 0 0 0 0 0 0 0 0999 V2000 0.2606 0.1503 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.3000 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.6000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.9000 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.9394 0.1503 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 2 3 1 0 3 4 1 0 4 5 1 0 M END > <ID> (1) 1 > <NAME> (1) n-pentane
  • 30. 3. 29
  • 31. 4. 30
  • 32. 5. 31
  • 33. 5. 32
  • 34. 1. 2. Materials Informatics 3. Python Materials Informatics 4. 5. 33
  • 36. ■ : ■ 
 https://www.jstage.jst.go.jp/article/cicsj/36/1/36_9/ _pdf/-char/ja ■ Deep Learning 
 https://kivantium.net/deep-for-chem 35
  • 39. SeqGAN 38 L.Yu, et al., AAAI2017. https://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/download/ 14344/14489 https://github.com/LantaoYu/SeqGAN (state): t-1 (action): t
  • 40. ORGAN ■SeqGAN 39 s d , d - h d o e ) . t , Finally in SeqGAN the reward function is provided by D . 4 ORGAN Figure 1: Schema for ORGAN. Left: D is trained as a classifier receiving as input a mix of real data and generated data by G. Right: G is trained by RL where the reward is a combination of D and the objectives, and is passed back to the policy function via Monte Carlo sampling. We penalize non-unique sequences. Figure 1 illustrates the main idea of ORGAN. To take into account domain-specific desired objectives Oi, we extend the SeqGAN SMILES G.Guimaraes et al.(2017) https://arxiv.org/abs/1705.10843 https://github.com/gablg1/ORGAN
  • 41. ORGAN 40 Objective Algorithm Validity (%) Diversity Druglikeliness Synthesizability Solubility MLE 75.9 0.64 0.48 (0%) 0.23 (0%) 0.30 (0%) SeqGAN 80.3 0.61 0.49 (2%) 0.25 (6%) 0.31 (3%) Druglikeliness ORGAN 88.2 0.55 0.52 (8%) 0.32 (38%) 0.35 (18%) OR(W)GAN 85.0 0.95 0.60 (25%) 0.54 (130%) 0.47 (57%) Naive RL 97.1 0.8 0.57 (19%) 0.53 (126%) 0.50 (67%) Synthesizability ORGAN 96.5 0.92 0.51 (6%) 0.83 (255%) 0.45 (52%) OR(W)GAN 97.6 1.00 0.20 (-59%) 0.75 (223%) 0.84 (184%) Naive RL 97.7 0.96 0.52 (8%) 0.83 (256%) 0.46 (54%) Solubility ORGAN 94.7 0.76 0.50 (4%) 0.63 (171%) 0.55 (85%) OR(W)GAN 94.1 0.90 0.42 (-12%) 0.66 (185%) 0.54 (81%) Naive RL 92.7 0.75 0.49 (3%) 0.70 (200%) 0.78 (162 %) All/Alternated ORGAN 96.1 92.3 0.52 (9%) 0.71 (206%) 0.53 (79%) ble 1: Evaluation of metrics, on several generative algorithms and optimized for different objectives for molecules. Reported values an values of valid generated molecules. The percentage of improvement over the MLE baseline is reported in parenthesis. Values sho bold indicate significant improvement. Shaded cell indicates direct optimized objectives. ble 2 shows quantitative results comparing ORGAN to oth baseline methods optimizing for three different metrics. O GAN outperforms SeqGAN and MLE in all of the three m rics. Naive RL achieves a higher score than ORGAN for Ratio of Steps metric, but it under-performs in terms of Druglikeliness, Synthesizability, Solubility
  • 42. ORGANIC ■ORGAN 41 Methods gure 1: Usage of ORGANIC illustrated. In the training procedure we show the thre ndamental components: a generator, a discriminator, and a reinforcement metric. Arrow icate the flow of inputs and outputs between networks. B.S-.Lengeling, et al.(2017) https://chemrxiv.org/articles/ORGANIC_1_pdf/5309668 https://github.com/aspuru-guzik-group/ORGANIC
  • 43. MolGAN ■ ■SMILES 42 ive model for small molecular graphs Cao 1 Thomas Kipf 1 Molecular graph Generator Discriminator Reward network z ~ p(z) 0/1 0/1 x ~ pdata(x) Generator Discriminator N.D.Cao and T.Kipf(2018) https://arxiv.org/abs/1805.11973
  • 44. MolGAN ■ 43 MolGAN: An implicit generative model for small molecular graphs Generator Graph Molecule N N N N N N T T z ~ p(z) Adjacency tensor Sampled SampledAnnotation matrix ~ ~ GCN GCN 0/1 0/1 Discriminator Reward network A<latexit sha1_base64="EMPyu5ASlEpI1qvrJeu1mckhUAU=">AAAB8XicbVDLSsNAFL3xWeur6tLNYBFclUSEuqy4cVnBPrANZTKdtEMnkzBzI5TQv3DjQhG3/o07/8ZJm4W2Hhg4nHMvc+4JEikMuu63s7a+sbm1Xdop7+7tHxxWjo7bJk414y0Wy1h3A2q4FIq3UKDk3URzGgWSd4LJbe53nrg2IlYPOE24H9GREqFgFK302I8ojoMwu5kNKlW35s5BVolXkCoUaA4qX/1hzNKIK2SSGtPz3AT9jGoUTPJZuZ8anlA2oSPes1TRiBs/myeekXOrDEkYa/sUkrn6eyOjkTHTKLCTeUKz7OXif14vxfDaz4RKUuSKLT4KU0kwJvn5ZCg0ZyinllCmhc1K2JhqytCWVLYleMsnr5L2Zc1za979VbVRL+oowSmcwQV4UIcG3EETWsBAwTO8wptjnBfn3flYjK45xc4J/IHz+QOmV5Da</latexit><latexit sha1_base64="EMPyu5ASlEpI1qvrJeu1mckhUAU=">AAAB8XicbVDLSsNAFL3xWeur6tLNYBFclUSEuqy4cVnBPrANZTKdtEMnkzBzI5TQv3DjQhG3/o07/8ZJm4W2Hhg4nHMvc+4JEikMuu63s7a+sbm1Xdop7+7tHxxWjo7bJk414y0Wy1h3A2q4FIq3UKDk3URzGgWSd4LJbe53nrg2IlYPOE24H9GREqFgFK302I8ojoMwu5kNKlW35s5BVolXkCoUaA4qX/1hzNKIK2SSGtPz3AT9jGoUTPJZuZ8anlA2oSPes1TRiBs/myeekXOrDEkYa/sUkrn6eyOjkTHTKLCTeUKz7OXif14vxfDaz4RKUuSKLT4KU0kwJvn5ZCg0ZyinllCmhc1K2JhqytCWVLYleMsnr5L2Zc1za979VbVRL+oowSmcwQV4UIcG3EETWsBAwTO8wptjnBfn3flYjK45xc4J/IHz+QOmV5Da</latexit><latexit sha1_base64="EMPyu5ASlEpI1qvrJeu1mckhUAU=">AAAB8XicbVDLSsNAFL3xWeur6tLNYBFclUSEuqy4cVnBPrANZTKdtEMnkzBzI5TQv3DjQhG3/o07/8ZJm4W2Hhg4nHMvc+4JEikMuu63s7a+sbm1Xdop7+7tHxxWjo7bJk414y0Wy1h3A2q4FIq3UKDk3URzGgWSd4LJbe53nrg2IlYPOE24H9GREqFgFK302I8ojoMwu5kNKlW35s5BVolXkCoUaA4qX/1hzNKIK2SSGtPz3AT9jGoUTPJZuZ8anlA2oSPes1TRiBs/myeekXOrDEkYa/sUkrn6eyOjkTHTKLCTeUKz7OXif14vxfDaz4RKUuSKLT4KU0kwJvn5ZCg0ZyinllCmhc1K2JhqytCWVLYleMsnr5L2Zc1za979VbVRL+oowSmcwQV4UIcG3EETWsBAwTO8wptjnBfn3flYjK45xc4J/IHz+QOmV5Da</latexit><latexit sha1_base64="EMPyu5ASlEpI1qvrJeu1mckhUAU=">AAAB8XicbVDLSsNAFL3xWeur6tLNYBFclUSEuqy4cVnBPrANZTKdtEMnkzBzI5TQv3DjQhG3/o07/8ZJm4W2Hhg4nHMvc+4JEikMuu63s7a+sbm1Xdop7+7tHxxWjo7bJk414y0Wy1h3A2q4FIq3UKDk3URzGgWSd4LJbe53nrg2IlYPOE24H9GREqFgFK302I8ojoMwu5kNKlW35s5BVolXkCoUaA4qX/1hzNKIK2SSGtPz3AT9jGoUTPJZuZ8anlA2oSPes1TRiBs/myeekXOrDEkYa/sUkrn6eyOjkTHTKLCTeUKz7OXif14vxfDaz4RKUuSKLT4KU0kwJvn5ZCg0ZyinllCmhc1K2JhqytCWVLYleMsnr5L2Zc1za979VbVRL+oowSmcwQV4UIcG3EETWsBAwTO8wptjnBfn3flYjK45xc4J/IHz+QOmV5Da</latexit> X<latexit sha1_base64="k8fMTYMpbcAk1m6rTYMegJsdMOM=">AAAB8XicbVDLSsNAFL2pr1pfVZduBovgqiQi1GXBjcsK9oFtKJPppB06mYSZG6GE/oUbF4q49W/c+TdO2iy09cDA4Zx7mXNPkEhh0HW/ndLG5tb2Tnm3srd/cHhUPT7pmDjVjLdZLGPdC6jhUijeRoGS9xLNaRRI3g2mt7nffeLaiFg94CzhfkTHSoSCUbTS4yCiOAnCrDcfVmtu3V2ArBOvIDUo0BpWvwajmKURV8gkNabvuQn6GdUomOTzyiA1PKFsSse8b6miETd+tkg8JxdWGZEw1vYpJAv190ZGI2NmUWAn84Rm1cvF/7x+iuGNnwmVpMgVW34UppJgTPLzyUhozlDOLKFMC5uVsAnVlKEtqWJL8FZPXiedq7rn1r3761qzUdRRhjM4h0vwoAFNuIMWtIGBgmd4hTfHOC/Ou/OxHC05xc4p/IHz+QPJSpDx</latexit><latexit sha1_base64="k8fMTYMpbcAk1m6rTYMegJsdMOM=">AAAB8XicbVDLSsNAFL2pr1pfVZduBovgqiQi1GXBjcsK9oFtKJPppB06mYSZG6GE/oUbF4q49W/c+TdO2iy09cDA4Zx7mXNPkEhh0HW/ndLG5tb2Tnm3srd/cHhUPT7pmDjVjLdZLGPdC6jhUijeRoGS9xLNaRRI3g2mt7nffeLaiFg94CzhfkTHSoSCUbTS4yCiOAnCrDcfVmtu3V2ArBOvIDUo0BpWvwajmKURV8gkNabvuQn6GdUomOTzyiA1PKFsSse8b6miETd+tkg8JxdWGZEw1vYpJAv190ZGI2NmUWAn84Rm1cvF/7x+iuGNnwmVpMgVW34UppJgTPLzyUhozlDOLKFMC5uVsAnVlKEtqWJL8FZPXiedq7rn1r3761qzUdRRhjM4h0vwoAFNuIMWtIGBgmd4hTfHOC/Ou/OxHC05xc4p/IHz+QPJSpDx</latexit><latexit sha1_base64="k8fMTYMpbcAk1m6rTYMegJsdMOM=">AAAB8XicbVDLSsNAFL2pr1pfVZduBovgqiQi1GXBjcsK9oFtKJPppB06mYSZG6GE/oUbF4q49W/c+TdO2iy09cDA4Zx7mXNPkEhh0HW/ndLG5tb2Tnm3srd/cHhUPT7pmDjVjLdZLGPdC6jhUijeRoGS9xLNaRRI3g2mt7nffeLaiFg94CzhfkTHSoSCUbTS4yCiOAnCrDcfVmtu3V2ArBOvIDUo0BpWvwajmKURV8gkNabvuQn6GdUomOTzyiA1PKFsSse8b6miETd+tkg8JxdWGZEw1vYpJAv190ZGI2NmUWAn84Rm1cvF/7x+iuGNnwmVpMgVW34UppJgTPLzyUhozlDOLKFMC5uVsAnVlKEtqWJL8FZPXiedq7rn1r3761qzUdRRhjM4h0vwoAFNuIMWtIGBgmd4hTfHOC/Ou/OxHC05xc4p/IHz+QPJSpDx</latexit><latexit sha1_base64="k8fMTYMpbcAk1m6rTYMegJsdMOM=">AAAB8XicbVDLSsNAFL2pr1pfVZduBovgqiQi1GXBjcsK9oFtKJPppB06mYSZG6GE/oUbF4q49W/c+TdO2iy09cDA4Zx7mXNPkEhh0HW/ndLG5tb2Tnm3srd/cHhUPT7pmDjVjLdZLGPdC6jhUijeRoGS9xLNaRRI3g2mt7nffeLaiFg94CzhfkTHSoSCUbTS4yCiOAnCrDcfVmtu3V2ArBOvIDUo0BpWvwajmKURV8gkNabvuQn6GdUomOTzyiA1PKFsSse8b6miETd+tkg8JxdWGZEw1vYpJAv190ZGI2NmUWAn84Rm1cvF/7x+iuGNnwmVpMgVW34UppJgTPLzyUhozlDOLKFMC5uVsAnVlKEtqWJL8FZPXiedq7rn1r3761qzUdRRhjM4h0vwoAFNuIMWtIGBgmd4hTfHOC/Ou/OxHC05xc4p/IHz+QPJSpDx</latexit> ˜X<latexit sha1_base64="h5fkkvOPNqe9NI7w0SLn2N2FVmc=">AAAB+3icbVDLSsNAFL3xWesr1qWbwSK4KokIdVlw47KCfUATymQyaYdOJmFmIpaQX3HjQhG3/og7/8ZJm4W2Hhg4nHMv98wJUs6Udpxva2Nza3tnt7ZX3z84PDq2Txp9lWSS0B5JeCKHAVaUM0F7mmlOh6mkOA44HQSz29IfPFKpWCIe9DylfowngkWMYG2ksd3wYqynQZR7mvGQ5sOiGNtNp+UsgNaJW5EmVOiO7S8vTEgWU6EJx0qNXCfVfo6lZoTTou5liqaYzPCEjgwVOKbKzxfZC3RhlBBFiTRPaLRQf2/kOFZqHgdmskyqVr1S/M8bZTq68XMm0kxTQZaHoowjnaCyCBQySYnmc0MwkcxkRWSKJSba1FU3JbirX14n/auW67Tc++tmp13VUYMzOIdLcKENHbiDLvSAwBM8wyu8WYX1Yr1bH8vRDavaOYU/sD5/ALyelNg=</latexit><latexit sha1_base64="h5fkkvOPNqe9NI7w0SLn2N2FVmc=">AAAB+3icbVDLSsNAFL3xWesr1qWbwSK4KokIdVlw47KCfUATymQyaYdOJmFmIpaQX3HjQhG3/og7/8ZJm4W2Hhg4nHMv98wJUs6Udpxva2Nza3tnt7ZX3z84PDq2Txp9lWSS0B5JeCKHAVaUM0F7mmlOh6mkOA44HQSz29IfPFKpWCIe9DylfowngkWMYG2ksd3wYqynQZR7mvGQ5sOiGNtNp+UsgNaJW5EmVOiO7S8vTEgWU6EJx0qNXCfVfo6lZoTTou5liqaYzPCEjgwVOKbKzxfZC3RhlBBFiTRPaLRQf2/kOFZqHgdmskyqVr1S/M8bZTq68XMm0kxTQZaHoowjnaCyCBQySYnmc0MwkcxkRWSKJSba1FU3JbirX14n/auW67Tc++tmp13VUYMzOIdLcKENHbiDLvSAwBM8wyu8WYX1Yr1bH8vRDavaOYU/sD5/ALyelNg=</latexit><latexit sha1_base64="h5fkkvOPNqe9NI7w0SLn2N2FVmc=">AAAB+3icbVDLSsNAFL3xWesr1qWbwSK4KokIdVlw47KCfUATymQyaYdOJmFmIpaQX3HjQhG3/og7/8ZJm4W2Hhg4nHMv98wJUs6Udpxva2Nza3tnt7ZX3z84PDq2Txp9lWSS0B5JeCKHAVaUM0F7mmlOh6mkOA44HQSz29IfPFKpWCIe9DylfowngkWMYG2ksd3wYqynQZR7mvGQ5sOiGNtNp+UsgNaJW5EmVOiO7S8vTEgWU6EJx0qNXCfVfo6lZoTTou5liqaYzPCEjgwVOKbKzxfZC3RhlBBFiTRPaLRQf2/kOFZqHgdmskyqVr1S/M8bZTq68XMm0kxTQZaHoowjnaCyCBQySYnmc0MwkcxkRWSKJSba1FU3JbirX14n/auW67Tc++tmp13VUYMzOIdLcKENHbiDLvSAwBM8wyu8WYX1Yr1bH8vRDavaOYU/sD5/ALyelNg=</latexit><latexit sha1_base64="h5fkkvOPNqe9NI7w0SLn2N2FVmc=">AAAB+3icbVDLSsNAFL3xWesr1qWbwSK4KokIdVlw47KCfUATymQyaYdOJmFmIpaQX3HjQhG3/og7/8ZJm4W2Hhg4nHMv98wJUs6Udpxva2Nza3tnt7ZX3z84PDq2Txp9lWSS0B5JeCKHAVaUM0F7mmlOh6mkOA44HQSz29IfPFKpWCIe9DylfowngkWMYG2ksd3wYqynQZR7mvGQ5sOiGNtNp+UsgNaJW5EmVOiO7S8vTEgWU6EJx0qNXCfVfo6lZoTTou5liqaYzPCEjgwVOKbKzxfZC3RhlBBFiTRPaLRQf2/kOFZqHgdmskyqVr1S/M8bZTq68XMm0kxTQZaHoowjnaCyCBQySYnmc0MwkcxkRWSKJSba1FU3JbirX14n/auW67Tc++tmp13VUYMzOIdLcKENHbiDLvSAwBM8wyu8WYX1Yr1bH8vRDavaOYU/sD5/ALyelNg=</latexit> ˜A<latexit sha1_base64="IVJAEzjPjiXPvp4Oo4QNTUc/Kds=">AAAB+3icbVDLSsNAFL2pr1pftS7dDBbBVUlEqMuKG5cV7AOaUCaTSTt0MgkzE7GE/IobF4q49Ufc+TdO2iy09cDA4Zx7uWeOn3CmtG1/W5WNza3tnepubW//4PCoftzoqziVhPZIzGM59LGinAna00xzOkwkxZHP6cCf3Rb+4JFKxWLxoOcJ9SI8ESxkBGsjjesNN8J66oeZqxkPaHaT5+N6027ZC6B14pSkCSW64/qXG8QkjajQhGOlRo6daC/DUjPCaV5zU0UTTGZ4QkeGChxR5WWL7Dk6N0qAwliaJzRaqL83MhwpNY98M1kkVateIf7njVIdXnsZE0mqqSDLQ2HKkY5RUQQKmKRE87khmEhmsiIyxRITbeqqmRKc1S+vk/5ly7Fbzv1Vs9Mu66jCKZzBBTjQhg7cQRd6QOAJnuEV3qzcerHerY/laMUqd07gD6zPH5mUlME=</latexit><latexit sha1_base64="IVJAEzjPjiXPvp4Oo4QNTUc/Kds=">AAAB+3icbVDLSsNAFL2pr1pftS7dDBbBVUlEqMuKG5cV7AOaUCaTSTt0MgkzE7GE/IobF4q49Ufc+TdO2iy09cDA4Zx7uWeOn3CmtG1/W5WNza3tnepubW//4PCoftzoqziVhPZIzGM59LGinAna00xzOkwkxZHP6cCf3Rb+4JFKxWLxoOcJ9SI8ESxkBGsjjesNN8J66oeZqxkPaHaT5+N6027ZC6B14pSkCSW64/qXG8QkjajQhGOlRo6daC/DUjPCaV5zU0UTTGZ4QkeGChxR5WWL7Dk6N0qAwliaJzRaqL83MhwpNY98M1kkVateIf7njVIdXnsZE0mqqSDLQ2HKkY5RUQQKmKRE87khmEhmsiIyxRITbeqqmRKc1S+vk/5ly7Fbzv1Vs9Mu66jCKZzBBTjQhg7cQRd6QOAJnuEV3qzcerHerY/laMUqd07gD6zPH5mUlME=</latexit><latexit sha1_base64="IVJAEzjPjiXPvp4Oo4QNTUc/Kds=">AAAB+3icbVDLSsNAFL2pr1pftS7dDBbBVUlEqMuKG5cV7AOaUCaTSTt0MgkzE7GE/IobF4q49Ufc+TdO2iy09cDA4Zx7uWeOn3CmtG1/W5WNza3tnepubW//4PCoftzoqziVhPZIzGM59LGinAna00xzOkwkxZHP6cCf3Rb+4JFKxWLxoOcJ9SI8ESxkBGsjjesNN8J66oeZqxkPaHaT5+N6027ZC6B14pSkCSW64/qXG8QkjajQhGOlRo6daC/DUjPCaV5zU0UTTGZ4QkeGChxR5WWL7Dk6N0qAwliaJzRaqL83MhwpNY98M1kkVateIf7njVIdXnsZE0mqqSDLQ2HKkY5RUQQKmKRE87khmEhmsiIyxRITbeqqmRKc1S+vk/5ly7Fbzv1Vs9Mu66jCKZzBBTjQhg7cQRd6QOAJnuEV3qzcerHerY/laMUqd07gD6zPH5mUlME=</latexit><latexit sha1_base64="IVJAEzjPjiXPvp4Oo4QNTUc/Kds=">AAAB+3icbVDLSsNAFL2pr1pftS7dDBbBVUlEqMuKG5cV7AOaUCaTSTt0MgkzE7GE/IobF4q49Ufc+TdO2iy09cDA4Zx7uWeOn3CmtG1/W5WNza3tnepubW//4PCoftzoqziVhPZIzGM59LGinAna00xzOkwkxZHP6cCf3Rb+4JFKxWLxoOcJ9SI8ESxkBGsjjesNN8J66oeZqxkPaHaT5+N6027ZC6B14pSkCSW64/qXG8QkjajQhGOlRo6daC/DUjPCaV5zU0UTTGZ4QkeGChxR5WWL7Dk6N0qAwliaJzRaqL83MhwpNY98M1kkVateIf7njVIdXnsZE0mqqSDLQ2HKkY5RUQQKmKRE87khmEhmsiIyxRITbeqqmRKc1S+vk/5ly7Fbzv1Vs9Mu66jCKZzBBTjQhg7cQRd6QOAJnuEV3qzcerHerY/laMUqd07gD6zPH5mUlME=</latexit> Figure 2. Outline of MolGAN. From left: the generator takes a sample from a prior distribution and generates a dense adjacency tensor A and an annotation matrix X. Subsequently, sparse and discrete ˜A and ˜X are obtained from A and X respectively via categorical sampling. The combination of ˜A and ˜X represents an annotated molecular graph which corresponds to a specific chemical compound. Finally, the graph is processed by both the discriminator and reward networks that are invariant to node order permutations and based on Relational-GCN (Schlichtkrull et al., 2017) layers. loss and the RL loss: passing them to D and ˆR in order to make the gen- eration stochastic while still forwarding continuous ob- N.D.Cao and T.Kipf(2018) https://arxiv.org/abs/1805.11973
  • 45. SMILES ■ SMILES 44 Grammar Variational Autoencoder O OH 'c1ccccc1' smiles chain ... chain branched atom atom aromatic organic 'c' ringbond digit '1' branched atom smiles chain chain branched atom chain branched atom 3 atom, ringbond branched atom aromatic organic atom 'c' aromatic organic 2 ringbond digit digit '1' 4 5 form parse tree extract rules convert to 1-hot vectors input SMILES map to latent space 6 chain, ... ...... ... chain branched atom smiles chain chain chain, branched atom atom, ringbondbranched atom atombranched atom aromatic organicatom aliphatic organicatom ringbond digit digit '1' 'c'aromatic organic 'C'aliphatic organic 'N'aliphatic organic digit '2' 1 SMILES grammar Figure 1. The encoder of the GVAE. We denote the start rule in blue and all rules that decode to terminal in green. See text for details. tion rules. We describe how the GVAE works using a sim- ple example. Encoding. Consider a subset of the SMILES grammar as shown in Figure 1, box 1 . These are the possible pro- duction rules that can be used for constructing a molecule. Imagine we are given as input the SMILES string for ben- zene: ‘c1ccccc1’. Figure 1, box 2 shows this molecule. To encode this molecule into a continuous latent represen- tation we begin by using the SMILES grammar to parse this string into a parse tree (partially shown in box 3 ). This tree describes how ‘c1ccccc1’ is generated by the grammar. We decompose this tree into a sequence of production rules by performing a pre-order traversal on the branches of the parse tree going from left-to-right, shown in box 4 . We convert these rules into 1-hot indicator vectors, where each dimension corresponds to a rule in the SMILES grammar, box 5 . Letting K denote the total number of production timesteps (production rules) allowed by the decoder. We will use these vectors in the rest of the decoder to select production rules. To ensure that any sequence of production rules generated from the decoder is valid, we keep track of the state of the parsing using a last-in first-out (LIFO) stack. This is shown in Figure 2, box 3 . At the beginning, every valid parse from the grammar must start with the start symbol: smiles, which is placed on the stack. Next we pop off whatever non-terminal symbol that was placed last on the stack (in this case smiles), and we use it to mask out the invalid dimensions of the logit vector. Formally, for ev- ery non-terminal ↵ we define a fixed binary mask vector m↵ 2 [0, 1]K . This takes the value ‘1’ for all indices in 1, . . . , K corresponding to production rules that have ↵ on their left-hand-side. In this case the only production rule in the grammar begin- ning with smiles is the first so we zero-out every dimension M.J.Kusner, et al. ICML2017 http://proceedings.mlr.press/v70/kusner17a https://github.com/mkusner/grammarVAE (variational autoencoder, VAE) Grammar Variational Autoencoder map from latent space 1 2 ... convert to logits maxlength smiles chain chain, branched atom branched atom branched atom, atom, branched atomringbond, aromatic organic, branched atomringbond, branched atom ringbond, stack mask out invalid rules pop first non-terminal sample rule & push non-terminals onto stack chainsmiles chain branched atom chain, chain branched atom chain smiles chain branched atom atom, ringbond branched atom atom aromatic organic ringbond digit branched atom atom aromatic organic 'c' aromatic organic ringbond digit digit '1'digit, ... ...... 3 4 5 concatenate terminals 6 'c1ccccc1' 7 translate molecule Figure 2. The decoder of the GVAE. See text for details. Algorithm 1 Sampling from the decoder Input: Deterministic decoder output F 2 RTmax⇥K , masks m↵ for each production rule ↵ Output: Sampled productions X from p(X|z) 1: Initialize empty stack S, and push the start symbol S onto the top; set t = 0 2: while S is nonempty do 3: Pop the last-pushed non-terminal ↵ from the stack S 4: Use Eq. (2) to sample a production rule R 5: Let xt be the 1-hot vector corresponding to R character-based VAE decoder is that at every point in the generated sequence, the character VAE can sample any possible character. There is no stack or masking opera- tion. The grammar VAE however is constrained to select syntactically-valid sequences. Syntactic vs. semantic validity. It is important to note that the grammar encodes syntactically valid molecules but not necessarily semantically valid molecules. This is mainly because of three reasons. First, certain molecules
  • 46. SMILES 45 B C N O S P F I H Cl Br 1 2 3 ( ) [ ] B C N O S P F I H Cl Br 1 2 3 ( ) [ ] B C N O S P F I H Cl Br 1 2 3 ( ) [ ] B C N O S P F I H Cl Br 1 2 3 ( ) [ ] C C 1 y(x1|w) y(x2|x<2, w) B C N O S P F I H Cl Br 1 2 3 ( ) [ ] C y(x3|x<3, w) y(x4|x<4, w) y(x5|x<5, w) (x1) (x2) (x3) (x4) RNN cell sequence inputs: Figure 1: The recurren imate the Q-function. function activation is acter in C. Here the SMILES alphabet and acters of the molecule example. The initial from the first hidden continues until the en during decoding, but its performance achieved by this method leaves scope fo method requires hand-crafted grammatical rules for each application domain In this paper, we propose a generative approach to modeling validity that constraints of a given discrete space. We show how concepts from reinforce used to define a suitable generative model and how this model can be approx D.Janz, et al. ICLR2018 https://arxiv.org/abs/1712.01664 https://github.com/DavidJanz/molecule_grammar_rnn LSTM
  • 48. ■ AlphaGO ■ 47 ARTICLE RESEARCH and the first-degree neighbouring atoms. Only rules that occurred at least 50 times in reactions published before 2015 were kept. For the Prediction with the in-scope filter network After the search space has been narrowed down by the expansion policy Search tree representationChemical representation of the synthesis plana b B E A F C D A= {1} B= {2,6} C= {3,6} D= {4,5,6} E= {8,9} F= {6,7,8} Root (target) Target Terminal solved state N O CO2 Me CO2Me Boc Ph HN O CO2Me CO2Me Ph MeCO2 MeO2C 1 2 3 5 4 6 7 9 8 N Boc Ph OH N Boc Ph OTBS HN Ph OH N H Boc OTBS Ph Br + + + +Boc2O 8 Boc2O Figure 1 | Translation of the traditional chemists’ retrosynthetic route representation to the search tree representation. a, The traditional chemists’ retrosynthetic route representation (conditions omitted)50 . b, The search tree representation. The nodes in the tree represent the synthetic position, and contain all precursors needed to make the molecules of the preceding positions all the way down to the tree’s root, which contains the target. Branches in the search tree correspond to complete routes. Calculating the value of branches through task- dependent scoring functions allows us to compare and rank different routes. The target molecule can be solved if it can be deconstructed to a set of readily available building blocks (marked red). Ph, phenyl; Boc, tert-butyloxycarbonyl; TBS, tert-butyldimethylsilyl. M.H.S.Segler, et al. Nature 555(2018) https://www.nature.com/articles/nature25978
  • 49. 48 ARTICLERESEARCH (1) Selection (2) Expansion (3) Rollout Pick and evaluate new position Incorporate evaluation in the search tree Pick most promising position Retroanalyse, add new nodes to tree by expansion procedure (see b) (4) Update δQ δQ δQ δ Invariant encoding Expansion policy: prioritizes transformations Keep the k best transformations and apply them to the target Keep likely reactions For each reaction use in-scope filter Target molecule A A Synthesis planning with Monte Carlo tree search Expansion procedureb a A B B C C Ranked precursor molecule positions T1 T2 . . . Tn R1 R2 . . Rk ECFP4 Symbolic Neural Neural SymbolicSymbolic Figure 2 | Schematic of MCTS methodology. a, MCTS searches by iterating over four phases. In the selection phase (1), the most urgent node for analysis is chosen on the basis of the current position values. In phase (2) this node may be expanded by processing the molecules of the position A with the expansion procedure (b), which leads to new positions B and C, which are added to the tree. Then, the most promising new position is chosen, and a rollout phase (3) is performed by randomly sampling transformations from the rollout policy until all molecules are solved or a certain depth is exceeded. In the update phase (4), the position values are updated in the current branch to reflect the result of the rollout. b, Expansion procedure. First, the molecule (A) to retroanalyse is converted to a fingerprint and fed into the policy network, which returns a probability distribution over all possible transformations (T1 to Tn). Then, only the k most probable transformations are applied to molecule A. This yields the reactants necessary to make A, and thus complete reactions R1 to Rk. For each reaction, the reaction prediction is performed using the in-scope filter, returning a probablity score. Improbable reactions are then filtered out, which leads to the list of admissible actions and corresponding precursor positions B and C. M.H.S.Segler, et al. Nature 555(7678), 604 (2018) https://www.nature.com/articles/nature25978
  • 50. Sequence-to-Sequence 49 del. Seq2seq Model. Neural sequence-to-sequence eq) models map one sequence to another and have y shown state of the art performance in many tasks such hine translation.49,50 It is based on an encoder−decoder cture that consists of two recurrent neural networks sequence log probability at each time step during decodi retained, where N is the width of the beam. The decod stopped once the lengths of the candidate sequences rea maximum decode length of 140 characters. The can sequences that contain an end of sequence charact considered to be complete. On average, about 97% of all 3. Seq2seq model architecture. DOI: 10.1021/acscentsc ACS Cent. Sci. 2017, 3, 11 1105 SMILES SMILES(SMART) B.Liu, et al. ACS. Cent. Sci. 3(10), 1103(2017) https://pubs.acs.org/doi/full/10.1021/acscentsci.7b00303 https://github.com/pandegroup/reaction_prediction_seq2seq
  • 51. ■Coley et al. (2017) 50 tension of the one-step strategy to multistep pathway planning is . characters (i.e., a product SMILES26 string without atom C.W. Coley et al. ACS. Cent. Sci. 3(12), 1237 (2017) https://pubs.acs.org/doi/full/10.1021/acscentsci.7b00355 https://github.com/connorcoley/retrosim
  • 52. ■Coley et al. 51 
 2018 9 http://www.molsci.jp/2018/pdf/4E13_w.pdf Coming soon… 2018 10 (IBIS) 2018 11
  • 53. 1. 2. Materials Informatics 3. Python Materials Informatics 4. 5. 52
  • 55. ■Materials Informatics ■ 
 https://www.jstage.jst.go.jp/article/ciqs/2017/0/2017_PL/ _pdf/-char/ja ■ 
 https://www.jstage.jst.go.jp/article/cicsj/36/1/36_9/_pdf/- char/ja ■ 
 https://www.ssken.gr.jp/MAINSITE/event/2017/20171026- sci/lecture-01/ SSKEN_sci2017_YoshidaRyo_presentation.pdf 54
  • 56. ■RDKit ■ https://www.rdkit.org/ ■RDKit 
 https://future-chem.com/rdkit-intro/ ■ 
 https://github.com/chemo-wakate ■RDKit http://rdkit-users.jp/ 55