Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Frontiers of data-driven property prediction: molecular machine learning

523 views

Published on

Innovation Camp 2018 for Computational Materials Science(ICCMS2018)
January 23rd(Tue.)-25th(Thu.), 2018
The Jozankei View Hotel, Sapporo, Hokkaido, Japan.

http://ccms.issp.u-tokyo.ac.jp/events/eventsfolder/ICCMS2018

In materials science, data-centric science is becoming one of the major approaches along with theoretical, experimental, and computational sciences. The main purpose of this camp is that we learn the basics of the machine learning as data-centric science and use it to solve problems in our researches through group works. We will also have lectures on advanced researches in computational and data-centric sciences and discuss future perspectives. Furthermore, we learn innovation minds by inviting lecturers who are at the forefront beyond the industry-government-academia framework.

計算物質科学イノベーションキャンプ2018

物質科学の課題を解決する際、理論科学、実験科学、計算科学に加え、データ科学の活用が盛んになっている。本キャンプでは、そのデータ科学として機械学習の手法を学び、チームでの実習を通し手法を身に着け、各自の研究やプロジェクトの課題解決に役立てることを主目的とする。また、講師を招いて計算科学やデータ科学の最先端の研究成果に関する講義と今後の発展の可能性などについて議論する。さらに、産官学や学問領域を超えて活躍する方々のレクチャーと意見交換などでイノベーションマインドを学ぶ。

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Frontiers of data-driven property prediction: molecular machine learning

  1. 1. ✓
  2. 2. 
 
 CH3 N H3C H NS N O CH3 N OH x ˆy ˆy = f✓(x)
  3. 3. N NH OO HH H H H H H H H H H H H H H H H H H H H H H H H O O O O O O Cl H H H H H HH H H H H H H H H H H Br Br O P O O Br Br O Br Br H H H H H H H H H H HH H HH N S N N H H H H H H H H H H H H H H H O N O O H H H O O H H N O O Cl ClCl H H H H H H H N O O H H H H H H H H H N O O H H H H H H H N H N O O N O O H H H H H H H H N CH3 O O H N Cl Cl Cl Cl Cl H3C O O O O O O H3C CH3 CH2 O HN O O NH CH3 HO OH CH3 N O O CH3 N N H N H H3C N H3C H3C NH O N O NO CH3 O N NH2 O CH3 Br CH3 N H3C H NS N O CH3 N OH CH3 CH3N N N CH3H3C H2N NH2 H OH O HO CH3 H H O CH3 H O OH3C HH H O H3C S CH3 O H H O CH3 CH3 OO HO H3CH HO F H O H3C NH2 O N HO HO O H H O O OH3C O O O CH3 O CH3 HO CH3 H O O CH3 H H N H N O H3C O O O
  4. 4.
  5. 5. CH3 N H3C H NS N O CH3 N OH x ˆy ˆy = f✓(x) {(x1, y1), (x2, y2), . . . , (xn, yn)} n f✓ ✓ min ✓ nX i=1 error(yi, ˆyi) ˆyi = f✓(xi) 
 x 7! y
  6. 6. 
 

  7. 7. SVM, LogReg, GPR, RF, etc. 
 … …
  8. 8. 
 

  9. 9. SciTegic12231509382D 13 13 0 0 0 0 999 V2000 -2.5458 -9.4750 0.0000 C 0 0 -3.3708 -9.4750 0.0000 C 0 0 -2.2875 -8.6917 0.0000 C 0 0 -3.6208 -8.6917 0.0000 C 0 0 2 0 0 0 -2.9583 -8.2042 0.0000 O 0 0 -4.3583 -8.3125 0.0000 C 0 0 1 0 0 0 -1.5000 -8.4375 0.0000 O 0 0 -2.0583 -10.1417 0.0000 O 0 0 -3.8500 -10.1417 0.0000 O 0 0 -5.0500 -8.7542 0.0000 O 0 0 -3.6958 -7.0417 0.0000 O 0 0 -4.3958 -7.4875 0.0000 C 0 0 -4.2083 -9.2667 0.0000 H 0 0 2 1 2 0 3 1 1 0 4 2 1 0 5 3 1 0 6 4 1 0 7 3 2 0 8 1 1 0 9 2 1 0 6 10 1 1 11 12 1 0 12 6 1 0 4 13 1 6 5 4 1 0 M END OC[C@H](O)[C@H]1OC(=O)C(=C1O)O InChI=1S/C6H8O6/ c7-1-2(8)5-3(9)4(10)6(11)12-5/ h2,5,7-10H,1H2/t2-,5+/m0/s1 CIWBSHSKHKDKBQ-JLAZNSOCSA-N 

  10. 10. Mrv0541 04051115152D 10 10 0 0 0 0 999 V2000 -0.9408 0.3707 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.6552 -0.0418 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.9408 1.1957 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.2263 -0.0418 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.4882 0.3707 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.2027 -0.0418 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.4882 1.1957 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 1.2027 -0.8668 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.4882 -1.2793 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.2263 -0.8668 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0 0 0 0 1 3 1 0 0 0 0 1 4 1 0 0 0 0 4 5 2 0 0 0 0 4 10 1 0 0 0 0 5 6 1 0 0 0 0 5 7 1 0 0 0 0 6 8 2 0 0 0 0 8 9 1 0 0 0 0 9 10 2 0 0 0 0 M END 1 2 3 4 5 6 7 8 9 102 C 1 2 3 4 5 6 7 8 9 10 1 1 2 1 2 2 1 1 1 C O O C C N C C C
  11. 11. @<TRIPOS>MOLECULE ***** 10 10 0 0 0 SMALL GASTEIGER @<TRIPOS>ATOM 1 C -0.9408 0.3707 0.0000 C.2 1 UNL1 0.3891 2 O -1.6552 -0.0418 0.0000 O.co2 1 UNL1 -0.2405 3 O -0.9408 1.1957 0.0000 O.co2 1 UNL1 -0.2405 4 C -0.2263 -0.0418 0.0000 C.ar 1 UNL1 0.0965 5 C 0.4882 0.3707 0.0000 C.ar 1 UNL1 0.0954 6 C 1.2027 -0.0418 0.0000 C.ar 1 UNL1 0.0183 7 N 0.4882 1.1957 0.0000 N.pl3 1 UNL1 -0.1278 8 C 1.2027 -0.8668 0.0000 C.ar 1 UNL1 0.0014 9 C 0.4882 -1.2793 0.0000 C.ar 1 UNL1 0.0003 10 C -0.2263 -0.8668 0.0000 C.ar 1 UNL1 0.0079 @<TRIPOS>BOND 1 1 2 ar 2 1 3 ar 3 1 4 1 4 4 5 ar 5 4 10 ar 6 5 6 ar 7 5 7 1 8 6 8 ar 9 8 9 ar 10 9 10 ar C.2 ar ar ar ar ar ar 1 ar 1 ar O.co2 O.co2 C.ar C.ar C.ar C.ar C.ar C.ar N.pl3 1 2 3 4 5 6 7 8 9 10
  12. 12. a h h h h d h h a h r r r r r r r rr r r r C O N S CC C C C C C C C C C C C C C C C C O2x C1x C1x C1x C1x N1x C1b C1b S2a C1c C8y C8y C8x C8x C8x C8x C8x C8xC8x C8x C8x C8x RA L L Ar Ar A Structure diagram Skeletal topology Atom/bond labeled graph KEGG atom labeled graph (KCF) Pharmacophore type labeled graph (ChemAxon Screen) Reduced graph 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 2 2 2 2 1 1
  13. 13. … … … … ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱ … ⋮ ⋮ y g 
 
 

  14. 14.
  15. 15. CH3 N H3C H NS N O CH3 N OH x ˆy ˆy = f✓(x) {(x1, y1), (x2, y2), . . . , (xn, yn)} n f✓ ✓ min ✓ nX i=1 error(yi, ˆyi) ˆyi = f✓(xi) 
 x 7! y
  16. 16. x =  x1 x2 y (z) z +1 1 tanh(z) 2 ( 1, 1) x1 x2 y 11 h0 1 h0 2 h1 h2 h3 w0 ji w00 i wkj i 0 j ijk ✓ = (wkj, w0 ji, w00 i , 0 j, i) y = 2X i w00 i (h0 i i) = 2X i w00 i 0 @ 3X j w0 ji (hj 0 j) i 1 A = 2X i w00 i 0 @ 3X j w0 ji 2X k=1 wkjxk 0 j ! i 1 A
  17. 17. min ✓ L(✓) L(✓) = nX i=1 error(yi, f✓(xi)) r✓L(✓t) = 2 6 4 @L(✓)/@✓1 |✓=✓t @L(✓)/@✓2 |✓=✓t ... 3 7 5 ✓t+1 ✓t ⌘ · r✓L(✓t)
  18. 18.
  19. 19.
  20. 20. ✓t+1 ✓t ⌘ · r✓L(✓t) L(✓) = Pn i=1 error(yi, f✓(xi)) Li(✓) = error(yi, f✓(xi))✓t+1 ✓t ⌘ · r✓Li(✓t) Lm i (✓) = Pi+m k=i error(yk, f✓(xk))✓t+1 ✓t ⌘ · r✓Lm i (✓t)
  21. 21. x = 2 6 6 6 6 4 x1 x2 x3 x4 x5 3 7 7 7 7 5 y = 2 4 y1 y2 y3 3 5 x 7! y y = f✓(x) x1 x2 x3 x4 x5 y1 y2 y3 wij ! wij + w yk ! yk + y wij @f✓(x) @wij = @yk @wij
  22. 22. a c b d e add mult add 1 c = a + b d = b + 1 e = c ⇤ d add mult add 1 a 2 b 1 a = 2 b = 1 c = 3 d = 2 e = 6
  23. 23. add mult add 1a = 2 b = 1 c = 3 d = 2 e = 6 @e @c = 2 @c @a = 1 @c @b = 1 @d @b = 1 @e @d = 3 c = a + b d = b + 1 e = c ⇤ d add mult add 1 @e @c = 2 @c @a = 1 @c @b = 1 @d @b = 1 @e @d = 3 add mult add 1 @e @c = 2 @c @a = 1 @c @b = 1 @d @b = 1 @e @d = 3 @a @b = 0 @c @b = 1 @b @b = 1 @d @b = 1 @e @b = 5 @e @e = 1 @e @c = 2 @e @d = 3 @e @b = 5 @e @a = 2 @e @b = @e @c @c @b + @e @d @d @b
  24. 24. x(1), x(2), . . . , x(t) y(1), y(2), . . . , y(t)7! x(t) h(t) y(t) x(1) x(2) y(2)y(1) h(0) h(1) h(2) x(t) y(t) h(t) 
 h(t) y(t) x(t) h(t 1) x(t) h(t 1) ⇥ + ⇥ tanh(·) (·)tanh(·)(·)(·) ⇥ y(t) x(t) (·)(·) tanh(·) 1 · ⇥ +⇥ ⇥ y(t) x(t)
  25. 25. w1 w2 w3 w4 i j 

  26. 26. 
f : Rn ! Rm f✓ f✓ ✓ x yx y ✓
  27. 27. CH3 N H3C H NS N O CH3 N OH x ˆy ˆy = f✓(x) {(x1, y1), (x2, y2), . . . , (xn, yn)} n f✓ ✓ min ✓ nX i=1 error(yi, ˆyi) ˆyi = f✓(xi) 
 x 7! y
  28. 28. Vector Annotations for Atoms (RDKit defaults)
  29. 29. 000100001010001000000010000100100101000010010001010010002 1 0 3 4 5 6 7 8 9 Layer-0 (diameter 0) Layer-1 (diameter 2) Layer-2 (diameter 4) 0 1 2 3 4 5 6 7 8 9 3 0 4 7 1 5 8 2 6 9 847957139 3217380708 3218693969 3218693969 3218693969 3218693969 864942730 2246699815 864662311 3217380708 1510328189 2784506312 1533864325 4158944142 2309124039 951226070 951226070 98513984 98513984 1083852209 2784506312 132611095 2784506312 916604632 3450167988 2987120039 1171638766 3999906991 3999906991 4158944142 folding into a fixed length
  30. 30. 
 

  31. 31. a1 a2 a3 a4 a5 a1 a2 a3 a4 a5 (a1, a2) (a1, a3) (a4, a5) A P (ai, aj) ai aj ai fA fP (A ! A)0 (A ! A)1 (A ! P)0 (P ! P)0 (P ! P)1 (P ! A)0 A P
  32. 32. v hv h(t 1) v a(t) v h(t 1) v a(t) v h(t) v v tanh X v2V (yv) tanh(zv) !  hT v xv yv zv
  33. 33. M M vv hv N(v) m U U R
  34. 34. G = (V, E, W), L = diag(d) W, di = P j Wij x 2 R|V | ˆx = U0 x, L = U⇤U0 g✓ ⇤ x = Ug✓(⇤)U0 x

×