AIST AIRC machine learning team
Masashi Tsubaki
(3 )
( )
[Tsubaki, Tomii, and Sese, 2018 in Bioinformatics]
[Tsubaki and Mizoguchi, 2018 in Journal of physical chemistry letters]
AIST AIRC machine learning team
Masashi Tsubaki
AIST AIRC machine learning team
1 ( )
Masashi Tsubaki
🤭 🤐 🤫
1
publish … ( )
( )
AIST AIRC machine learning team
Masashi Tsubaki
https://github.com/masashitsubaki
( !)
AIST AIRC machine learning team
GitHub ( SMILES )
Masashi Tsubaki
https://github.com/masashitsubaki/molecularGNN_smiles
GNN
CC(=O)OC1=CC=CC=C1C(=O)O
SMILES
SMILES
(※ SMILES RDKit )
(e.g., or not)
AIST AIRC machine learning team
SMILES
Masashi Tsubaki
(SMILES 0 or 1)
(SMILES )
(※GitHub )
AIST AIRC machine learning team
GitHub ( )
Masashi Tsubaki
Atom x y z
O 0.03 0.98 0.008
H 0.06 0.02 0.002
H 0.87 1.30 0.0007
Water molecule
GNN
(e.g., )
https://github.com/masashitsubaki/molecularGNN_3Dstructure
AIST AIRC machine learning team
Masashi Tsubaki
Molecular property types
Data index
Each atom and its 3D coordinate
in the molecule CH4
Properties in order of the above types
In the following, each data is
described with the same format
( README)
AIST AIRC machine learning team
Masashi Tsubaki
( )
bash train.sh
train.sh
AIST AIRC machine learning team
bash train.sh (Google colab )
Masashi Tsubaki
AIST AIRC machine learning team
(QM9 )
Masashi Tsubaki
dataset: QM9_under14atoms
property: U0(kcal/mol)
dim: 200
layer_hidden: 6
layer_output: 6
batch_train: 32
batch_test: 32
learning_rate: 1e-3
decay of learning rate: 0.99
interval of decay: 10
iteration: 3000
1.9 kcal/mol
1.0 kcal/mol
AIST AIRC machine learning team
Masashi Tsubaki
data_train public data_test in house
dataset 2
(e.g., ) (e.g., )
AIST AIRC machine learning team
preprocess.py train.py ( )
Masashi Tsubaki
( )
AIST AIRC machine learning team
Masashi Tsubaki
DeepChem
Schnet
Chainer chemistry
kGCN
AIST AIRC machine learning team
Masashi Tsubaki
tsubaki.masashi@aist.go.jp
GitHub 🙇
…
🙇
AIST AIRC machine learning team
Masashi Tsubaki
https://github.com/masashitsubaki
( !)
[Tsubaki, Tomii, and Sese, 2018 in Bioinformatics]
[Tsubaki and Mizoguchi, 2018 in Journal of physical chemistry letters]
AIST AIRC machine learning team
Masashi Tsubaki
😅
🤔 COOH
( )
1
0
1 (e.g., )
( or not)
AIST AIRC machine learning team
( + )
Masashi Tsubaki
(e.g., )
AIST AIRC machine learning team
Masashi Tsubaki
🤖 ( or not)
( )
( ) (end-to-end)
( )
Graph neural network (GNN)
[Scarselli+ 09; Kearnes+ 16]
(1)
( )
(2) (or )
( )
(3)
( )
O C
H
H
AIST AIRC machine learning team
Masashi Tsubaki
: GNN
NN ( )
( transition propagation( ) message passing )
O C
H
H
AIST AIRC machine learning team
Masashi Tsubaki
: GNN
x
(`+1)
i = x
(`)
i +
X
j
f(x
(`)
j )
e.g, ReLu(Wx+b)
NN
(4)
(5)
sum
or not
( )
AIST AIRC machine learning team
Masashi Tsubaki
O C
H
H
sum
or not
(NN ) end-to-end
AIST AIRC machine learning team
Masashi Tsubaki
O C
H
H
PyTorch-like code
AIST AIRC machine learning team
( )
Masashi Tsubaki
C
O
H
fif 1 else 0
H
[Tsubaki, Tomii, and Sese, 2018 in Bioinformatics]
[Tsubaki and Mizoguchi, 2018 in Journal of physical chemistry letters]
AIST AIRC machine learning team
Masashi Tsubaki
or not
GNN
GNN assumption
( )
Masashi Tsubaki
AIST AIRC machine learning team
Masashi Tsubaki
AIST AIRC machine learning team
β-lactam
Masashi Tsubaki
AIST AIRC machine learning team
👍
x1
x2
xM
r
(e.g., r=2)
Masashi Tsubaki
x
(`+1)
i = x
(`)
i +
X
j
f(x
(`)
j )
i-th fingerprint
vector
Neighboring fingerprint
vector
fingerprint based-GNN
AIST AIRC machine learning team
β-lactam
Fingerprint vectors
Masashi Tsubaki
( GitHub)
AIST AIRC machine learning team
https://github.com/masashitsubaki/CPI_prediction
fingerprint-GNN CNN
Masashi Tsubaki
(human C.elegans)
AIST AIRC machine learning team
( …)
Dataset: Human
Radius of subgraphs (fingerprints): 2
Amino acid n-gram :3
Dimensionality :10
Layer of GNN: 3
Window size of n-gram :5
Layer of CNN: 3
Learning rate: 1e-4
Decay of learning rate: 0.5
Interval of decay: 10
Dataset: C.elegans
Radius of subgraphs (fingerprints): 2
Amino acid n-gram :3
Dimensionality :10
Layer of GNN: 3
Window size of n-gram :5
Layer of CNN: 3
Learning rate: 1e-4
Decay of learning rate: 0.5
Interval of decay: 10
Liu et al., 2015 -
(highly credible negative samples)
Masashi Tsubaki
AIST AIRC machine learning team
... some reports even above 0.99 AUC on standard benchmarks…
Masashi Tsubaki
AIST AIRC machine learning team
( )
…
[Tsubaki, Tomii, and Sese, 2018 in Bioinformatics]
[Tsubaki and Mizoguchi, 2018 in Journal of physical chemistry letters]
AIST AIRC machine learning team
Masashi Tsubaki
AIST AIRC machine learning team
Masashi Tsubaki
Atom x y z
O 0.03 0.98 0.008
H 0.06 0.02 0.002
H 0.87 1.30 0.0007
3
( )
-9.24 eV
Water molecule ( …)
( …)
( …)
AIST AIRC machine learning team
Nature comm PRL ( …)
Masashi Tsubaki
( )
AIST AIRC machine learning team
Masashi Tsubaki
( …)
( )
O C
H
H
※
AIST AIRC machine learning team
Masashi Tsubaki
Schnet( )
i
x
(`+1)
i =
X
j
e j d2
ij f(x
(`)
j )
j
( …)
( )
AIST AIRC machine learning team
Masashi Tsubaki
Gaussian
( )
=
O
H
H
f
⌘
exp
⇣
2
※
AIST AIRC machine learning team
Masashi Tsubaki
(※QM9 29 130k )
QM9 14 (15k )
( )
AIST AIRC machine learning team
Masashi Tsubaki
QM9 14 (10k )
1.0
Chemical accuracy
1.90 kcal/mol
AIST AIRC machine learning team
Masashi Tsubaki
1 ( )
1.24 kcal/mol
GNN
( …)
1.90 kcal/mol
1.0
AIST AIRC machine learning team
Masashi Tsubaki
New!
GNN ( )
AIST AIRC machine learning team
( )
Masashi Tsubaki
14 15
New!
AIST AIRC machine learning team
Masashi Tsubaki
Atom x y z
O 0.03 0.98 0.008
H 0.06 0.02 0.002
H 0.87 1.30 0.0007
( )
=
1
-9.24 eV
Water molecule
=
etc…
…
( )
🤭
AIST AIRC machine learning team
Masashi Tsubaki
SchNOrb extends the deep tensor neural network SchNet to represent electronic
wavefunctions, ...model uses about 93 million parameters to predict a large Hamiltonian…
AIST AIRC machine learning team
Masashi Tsubaki
https://github.com/masashitsubaki
tsubaki.masashi@aist.go.jp

グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)

  • 2.
    AIST AIRC machinelearning team Masashi Tsubaki (3 ) ( )
  • 3.
    [Tsubaki, Tomii, andSese, 2018 in Bioinformatics] [Tsubaki and Mizoguchi, 2018 in Journal of physical chemistry letters] AIST AIRC machine learning team Masashi Tsubaki
  • 4.
    AIST AIRC machinelearning team 1 ( ) Masashi Tsubaki 🤭 🤐 🤫 1 publish … ( ) ( )
  • 5.
    AIST AIRC machinelearning team Masashi Tsubaki https://github.com/masashitsubaki ( !)
  • 6.
    AIST AIRC machinelearning team GitHub ( SMILES ) Masashi Tsubaki https://github.com/masashitsubaki/molecularGNN_smiles GNN CC(=O)OC1=CC=CC=C1C(=O)O SMILES SMILES (※ SMILES RDKit ) (e.g., or not)
  • 7.
    AIST AIRC machinelearning team SMILES Masashi Tsubaki (SMILES 0 or 1) (SMILES ) (※GitHub )
  • 8.
    AIST AIRC machinelearning team GitHub ( ) Masashi Tsubaki Atom x y z O 0.03 0.98 0.008 H 0.06 0.02 0.002 H 0.87 1.30 0.0007 Water molecule GNN (e.g., ) https://github.com/masashitsubaki/molecularGNN_3Dstructure
  • 9.
    AIST AIRC machinelearning team Masashi Tsubaki Molecular property types Data index Each atom and its 3D coordinate in the molecule CH4 Properties in order of the above types In the following, each data is described with the same format ( README)
  • 10.
    AIST AIRC machinelearning team Masashi Tsubaki ( ) bash train.sh train.sh
  • 11.
    AIST AIRC machinelearning team bash train.sh (Google colab ) Masashi Tsubaki
  • 12.
    AIST AIRC machinelearning team (QM9 ) Masashi Tsubaki dataset: QM9_under14atoms property: U0(kcal/mol) dim: 200 layer_hidden: 6 layer_output: 6 batch_train: 32 batch_test: 32 learning_rate: 1e-3 decay of learning rate: 0.99 interval of decay: 10 iteration: 3000 1.9 kcal/mol 1.0 kcal/mol
  • 13.
    AIST AIRC machinelearning team Masashi Tsubaki data_train public data_test in house dataset 2 (e.g., ) (e.g., )
  • 14.
    AIST AIRC machinelearning team preprocess.py train.py ( ) Masashi Tsubaki ( )
  • 15.
    AIST AIRC machinelearning team Masashi Tsubaki DeepChem Schnet Chainer chemistry kGCN
  • 16.
    AIST AIRC machinelearning team Masashi Tsubaki tsubaki.masashi@aist.go.jp GitHub 🙇 …
  • 17.
    🙇 AIST AIRC machinelearning team Masashi Tsubaki https://github.com/masashitsubaki ( !)
  • 18.
    [Tsubaki, Tomii, andSese, 2018 in Bioinformatics] [Tsubaki and Mizoguchi, 2018 in Journal of physical chemistry letters] AIST AIRC machine learning team Masashi Tsubaki
  • 19.
    😅 🤔 COOH ( ) 1 0 1(e.g., ) ( or not) AIST AIRC machine learning team ( + ) Masashi Tsubaki (e.g., )
  • 20.
    AIST AIRC machinelearning team Masashi Tsubaki 🤖 ( or not) ( ) ( ) (end-to-end) ( ) Graph neural network (GNN) [Scarselli+ 09; Kearnes+ 16]
  • 21.
    (1) ( ) (2) (or) ( ) (3) ( ) O C H H AIST AIRC machine learning team Masashi Tsubaki : GNN
  • 22.
    NN ( ) (transition propagation( ) message passing ) O C H H AIST AIRC machine learning team Masashi Tsubaki : GNN x (`+1) i = x (`) i + X j f(x (`) j ) e.g, ReLu(Wx+b) NN (4) (5)
  • 23.
    sum or not ( ) AISTAIRC machine learning team Masashi Tsubaki O C H H
  • 24.
    sum or not (NN )end-to-end AIST AIRC machine learning team Masashi Tsubaki O C H H
  • 25.
    PyTorch-like code AIST AIRCmachine learning team ( ) Masashi Tsubaki C O H fif 1 else 0 H
  • 26.
    [Tsubaki, Tomii, andSese, 2018 in Bioinformatics] [Tsubaki and Mizoguchi, 2018 in Journal of physical chemistry letters] AIST AIRC machine learning team Masashi Tsubaki
  • 27.
    or not GNN GNN assumption () Masashi Tsubaki AIST AIRC machine learning team
  • 28.
    Masashi Tsubaki AIST AIRCmachine learning team
  • 29.
  • 30.
    👍 x1 x2 xM r (e.g., r=2) Masashi Tsubaki x (`+1) i= x (`) i + X j f(x (`) j ) i-th fingerprint vector Neighboring fingerprint vector fingerprint based-GNN AIST AIRC machine learning team β-lactam Fingerprint vectors
  • 31.
    Masashi Tsubaki ( GitHub) AISTAIRC machine learning team https://github.com/masashitsubaki/CPI_prediction fingerprint-GNN CNN
  • 32.
    Masashi Tsubaki (human C.elegans) AISTAIRC machine learning team ( …) Dataset: Human Radius of subgraphs (fingerprints): 2 Amino acid n-gram :3 Dimensionality :10 Layer of GNN: 3 Window size of n-gram :5 Layer of CNN: 3 Learning rate: 1e-4 Decay of learning rate: 0.5 Interval of decay: 10 Dataset: C.elegans Radius of subgraphs (fingerprints): 2 Amino acid n-gram :3 Dimensionality :10 Layer of GNN: 3 Window size of n-gram :5 Layer of CNN: 3 Learning rate: 1e-4 Decay of learning rate: 0.5 Interval of decay: 10 Liu et al., 2015 - (highly credible negative samples)
  • 33.
    Masashi Tsubaki AIST AIRCmachine learning team ... some reports even above 0.99 AUC on standard benchmarks…
  • 34.
    Masashi Tsubaki AIST AIRCmachine learning team ( ) …
  • 35.
    [Tsubaki, Tomii, andSese, 2018 in Bioinformatics] [Tsubaki and Mizoguchi, 2018 in Journal of physical chemistry letters] AIST AIRC machine learning team Masashi Tsubaki
  • 36.
    AIST AIRC machinelearning team Masashi Tsubaki Atom x y z O 0.03 0.98 0.008 H 0.06 0.02 0.002 H 0.87 1.30 0.0007 3 ( ) -9.24 eV Water molecule ( …) ( …) ( …)
  • 37.
    AIST AIRC machinelearning team Nature comm PRL ( …) Masashi Tsubaki ( )
  • 38.
    AIST AIRC machinelearning team Masashi Tsubaki ( …) ( ) O C H H ※
  • 39.
    AIST AIRC machinelearning team Masashi Tsubaki Schnet( ) i x (`+1) i = X j e j d2 ij f(x (`) j ) j ( …) ( )
  • 40.
    AIST AIRC machinelearning team Masashi Tsubaki Gaussian ( ) = O H H f ⌘ exp ⇣ 2 ※
  • 41.
    AIST AIRC machinelearning team Masashi Tsubaki (※QM9 29 130k ) QM9 14 (15k ) ( )
  • 42.
    AIST AIRC machinelearning team Masashi Tsubaki QM9 14 (10k ) 1.0 Chemical accuracy 1.90 kcal/mol
  • 43.
    AIST AIRC machinelearning team Masashi Tsubaki 1 ( ) 1.24 kcal/mol GNN ( …) 1.90 kcal/mol 1.0
  • 44.
    AIST AIRC machinelearning team Masashi Tsubaki New! GNN ( )
  • 45.
    AIST AIRC machinelearning team ( ) Masashi Tsubaki 14 15 New!
  • 46.
    AIST AIRC machinelearning team Masashi Tsubaki Atom x y z O 0.03 0.98 0.008 H 0.06 0.02 0.002 H 0.87 1.30 0.0007 ( ) = 1 -9.24 eV Water molecule = etc… … ( ) 🤭
  • 47.
    AIST AIRC machinelearning team Masashi Tsubaki SchNOrb extends the deep tensor neural network SchNet to represent electronic wavefunctions, ...model uses about 93 million parameters to predict a large Hamiltonian…
  • 48.
    AIST AIRC machinelearning team Masashi Tsubaki https://github.com/masashitsubaki tsubaki.masashi@aist.go.jp