Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Garrett Goh, Scientist, Pacific Northwest National Lab

624 views

Published on

Garrett Goh is a Scientist at the Pacific Northwest National Lab (PNNL), in the Advanced Computing, Mathematics & Data Division. He was previously awarded the Howard Hughes Medical Institute fellowship which supported his PhD in Computational Chemistry at the University of Michigan. At PNNL, he was awarded the Pauling Fellowship that supports his research initiative of combining deep learning and artificial intelligence with traditional chemistry applications. His current interests is in AI-assisted computational chemistry, which is the application of deep learning to predict chemical properties and the discovery of new chemical insights, while using minimal expert knowledge.

Abstract summary

A Deep Learning Computational Chemistry AI: Making chemical predictions with minimal expert knowledge:
Using deep learning and with virtually no expert knowledge, we construct computational chemistry models that perform favorably to existing state-of-the-art models developed by expert practitioners, whose models rely on the knowledge gained from decades of academic research. Our findings potentially demonstrates the impact of AI assistance in accelerating the scientific discovery process, where we envision future applications not just in chemistry, but in affiliated fields, such as biotechnology, pharmaceuticals, consumer goods, and perhaps other domains as well.

Published in: Technology
  • Be the first to comment

Garrett Goh, Scientist, Pacific Northwest National Lab

  1. 1. A Deep Learning Computational Chemistry “AI” Making chemical predictions with minimal expert knowledge GARRETT GOH (@garrettbgoh) May 17, 2017 1 High Performance Computing Group, Advanced Computing, Mathematics & Data Division, Pacific Northwest National Laboratory
  2. 2. CHEMical InCEPTION May 17, 2017 2
  3. 3. @garrettbgoh Recent Trends in Deep Learning May 17, 2017 3
  4. 4. @garrettbgoh Recent Trends in Deep Learning May 17, 2017 4
  5. 5. @garrettbgoh What is Deep Learning? Deep Learning = Multi-layer artificial neural network May 17, 2017 5
  6. 6. @garrettbgoh What is Deep Learning? Deep Learning = Multi-layer artificial neural network May 17, 2017 6 Input (Features) Output (Prediction)
  7. 7. @garrettbgoh What is Deep Learning? Deep Learning = Multi-layer artificial neural network May 17, 2017 7 Input (Features) Output (Prediction) Many Hidden Layers
  8. 8. @garrettbgoh Why Deep Learning today? What has changed from the past? Substantial increase of data (particularly from internet) Improved algorithms for training deep neural networks GPU-accelerated deep learning at reasonable cost May 17, 2017 8Glorot, X.; Bordes, A.; Bengio, Y. Proc. of the 14th Int. Conf. on Artificial Intelligence and Statistics 2011 Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. J. Mach. Learn Res. 2014, 15, 1929
  9. 9. @garrettbgoh What makes Deep Learning better than traditional/shallow Machine Learning? Representation Learning  Automated Feature Engineering May 17, 2017 9 http://www.nature.com/news/computer-science-the-learning-machines-1.14481
  10. 10. @garrettbgoh A Case Study of Deep Learning Success in Computer Vision Human-level performance in image classification within 3 years Manual feature engineering has been mostly replaced by deep neural networks May 17, 2017 10 Goh, G.B.; Hodas, N.O. Vishnu, A. J. Comp. Chem., 2017, 38, 1291
  11. 11. Deep Learning for Chemistry May 17, 2017 11
  12. 12. @garrettbgoh A Short History on Feature Engineering in Chemistry 1880s: First concepts of “molecular structure” emerged 1940s: First modern molecular descriptors (i.e. engineered features of molecules/chemicals) emerged 1960s: First modern QSAR/QSPR models developed (i.e. simple regression models that predict a chemical’s activity or property) 1980s: Modern machine learning algorithms adopted (linear regression  SVMs  RF) 2010s: First deep learning models using molecular descriptors for chemistry developed May 17, 2017 12 “Feature engineering in chemistry has been going on for a while….”
  13. 13. @garrettbgoh A Short History on Feature Engineering in Chemistry 1880s: First concepts of “molecular structure” emerged 1940s: First modern molecular descriptors (i.e. engineered features of molecules/chemicals) emerged 1960s: First modern QSAR/QSPR models developed (i.e. simple regression models that predict a chemical’s activity or property) 1980s: Modern machine learning algorithms adopted (linear regression  SVMs  RF) 2010s: First deep learning models using molecular descriptors for chemistry developed Today: First deep learning models using “raw image data” for chemistry developed May 17, 2017 13 “How much chemistry do you need to know to predict chemistry?”
  14. 14. @garrettbgoh Deep Learning for Computational Chemistry Deep Learning trained on molecular descriptors outperformed traditional ML in the Merck Kaggle challenge in 2012 (activity prediction) and Tox21 challenge (toxicity prediction) in 2014 May 17, 2017 14 Mayr, A.; Klambauer, G.; Unterthiner, T.; Hochreiter, S. Front. Env. Sci. 2016, 3, 1. Ramsundar, B.; Kearnes, S.; Riley, P.; Webster, D.; Konerding, D.; Pande, V. 2015 https://arxiv.org/abs/1502.02072 Dahl, G. E.; Jaitly, N.; Salakhutdinov, R. 2014 https://arxiv.org/abs/1406.1231
  15. 15. @garrettbgoh Deep Learning as a Machine Learning Tool in Scientific (Chemistry) Research May 17, 2017 15 Goh, G.B.; Siegel, C.; Vishnu, A.; Hodas, N.O.; Baker, N.A. 2017, in preparation
  16. 16. @garrettbgoh Deep Learning as “Machine Intelligence” in Scientific (Chemistry) Research May 17, 2017 16 Goh, G.B.; Siegel, C.; Vishnu, A.; Hodas, N.O.; Baker, N.A. 2017, in preparation aka…“Siri for chemists”
  17. 17. @garrettbgoh Designing a Deep Learning Framework with Minimal Chemistry Knowledge May 17, 2017 17 Draw MoleculesHigh School Students Goh, G.B.; Siegel, C.; Vishnu, A.; Hodas, N.O.; Baker, N.A. 2017, in preparation
  18. 18. @garrettbgoh Deep Learning predicts Physiological, Biochemical & Physical Properties May 17, 2017 18 Physiological e.g. Toxicity Binary Classification 10,000 images Biochemical e.g. Activity Binary Classification 40,000 images Physical e.g. Solvation Regression 500 images Deep Learning
  19. 19. @garrettbgoh Experiments with Different Deep Neural Network Architectures AlexNet: Linear topology ResNet: Linear topology with residual links GoogleNet: Branched topology May 17, 2017 19 Krizhevsky, A.; Sutskever, I.; Hinton, G. E. Advances in Neural Information Processing Systems 2012. He, K.; Zhang, X.; Ren, S.; Sun, J. 2015 https://arxiv.org/abs/1512.03385 Szegedy, C.; et. al. 2014 https://arxiv.org/abs/1409.4842
  20. 20. @garrettbgoh Experiments with Different Deep Neural Network Architectures In the regime of limited data, the are limits to the size (depth & breadth) of deep neural networks May 17, 2017 20
  21. 21. @garrettbgoh Chemception Deep Neural Network Based off Inception-ResNet v2 architectural template May 17, 2017 21 Goh, G.B.; Siegel, C.; Vishnu, A.; Hodas, N.O.; Baker, N.A. 2017, in preparation
  22. 22. @garrettbgoh Tweaking Chemception (Depth & Width) Chemception T3_F16 (~150,000 parameters, 45 layers), was empirically determined to be the optimal neural network architecture Tested depth from 21 to 69 layers Tested width from 16 to 64 convolutional filters/layer No. of parameters varied from ~70,000 to 2.4 million Deep & skinny neural network seems to work best for small datasets of chemical images May 17, 2017 22 n=3 n=3 n=3 Goh, G.B.; Siegel, C.; Vishnu, A.; Hodas, N.O.; Baker, N.A. 2017, in preparation
  23. 23. @garrettbgoh Benchmarking Chemception Performance May 17, 2017 23 vs aka engineered features Goh, G.B.; Siegel, C.; Vishnu, A.; Hodas, N.O.; Baker, N.A. 2017, in preparation Non-Chemception Data from Wu, Z., et. al. 2017, https://arxiv.org/abs/1703.00564
  24. 24. @garrettbgoh Chemception + Raw Images Activity Prediction Results Slightly outperforms traditional ML using engineered features Outperforms DL (MLP) using engineered features May 17, 2017 24Goh, G.B.; Siegel, C.; Vishnu, A.; Hodas, N.O.; Baker, N.A. 2017, in preparation Non-Chemception Data from Wu, Z., et. al. 2017, https://arxiv.org/abs/1703.00564 Uses engineered features Uses engineered features Ours Ours
  25. 25. @garrettbgoh Chemception + Raw Images Toxicity Prediction Results Outperforms traditional ML using engineered features Slightly underperforms DL (MLP) using engineered features May 17, 2017 25Goh, G.B.; Siegel, C.; Vishnu, A.; Hodas, N.O.; Baker, N.A. 2017, in preparation Non-Chemception Data from Wu, Z., et. al. 2017, https://arxiv.org/abs/1703.00564 Uses engineered features Uses engineered features Ours Ours
  26. 26. @garrettbgoh Chemception + Raw Images Solvation Prediction Results Outperforms DL (MLP) using engineered features Slightly underperforms physics-based models May 17, 2017 26Goh, G.B.; Siegel, C.; Vishnu, A.; Hodas, N.O.; Baker, N.A. 2017, in preparation Non-Chemception Data from Wu, Z., et. al. 2017, https://arxiv.org/abs/1703.00564 Uses engineered features Uses engineered features Ours Ours
  27. 27. @garrettbgoh Improving Chemception Performance DATA: High quality labeled data is expensive and limited in technical sciences From Greyscale to Color Augmented Images: Encoded domain-specific information into image channels May 17, 2017 27 Chemistry property #1 Chemistry property #2 Chemistry property #3
  28. 28. @garrettbgoh Designing a Deep Learning Framework with Minimal Chemistry Knowledge May 17, 2017 28 Draw MoleculesHigh School Students Annotate Drawings with Basic Chemistry Knowledge
  29. 29. @garrettbgoh Chemception + Augmented Images Activity Prediction Results Outperforms traditional ML using engineered features Outperforms DL (MLP) using engineered features May 17, 2017 29Goh, G.B.; Siegel, C.; Vishnu, A.; Hodas, N.O.; Baker, N.A. 2017, in preparation Non-Chemception Data from Wu, Z., et. al. 2017, https://arxiv.org/abs/1703.00564 Uses engineered features Uses engineered features
  30. 30. @garrettbgoh Chemception + Augmented Images Toxicity Prediction Results Outperforms traditional ML using engineered features Outperforms DL (MLP) using engineered features May 17, 2017 30Goh, G.B.; Siegel, C.; Vishnu, A.; Hodas, N.O.; Baker, N.A. 2017, in preparation Non-Chemception Data from Wu, Z., et. al. 2017, https://arxiv.org/abs/1703.00564 Uses engineered features Uses engineered features
  31. 31. @garrettbgoh Chemception + Augmented Images Solvation Prediction Results Outperforms DL (MLP) using engineered features Outperforms physics-based models! May 17, 2017 31Goh, G.B.; Siegel, C.; Vishnu, A.; Hodas, N.O.; Baker, N.A. 2017, in preparation Non-Chemception Data from Wu, Z., et. al. 2017, https://arxiv.org/abs/1703.00564 Uses engineered features Uses engineered features
  32. 32. @garrettbgoh Conclusion Chemception: A deep neural network that predict chemical properties just as well as expert-developed models, but with minimal chemical knowledge When trained with augmented images, Chemception outperforms both ML & DL models that uses engineered features A general (i.e. not domain specific) framework that represents a “proof of concept” for using a deep learning machine intelligence in research May 17, 2017 32
  33. 33. @garrettbgoh Conclusion Q: How much chemistry do you need to know to predict chemistry? A: Not a lot… May 17, 2017 33
  34. 34. @garrettbgoh Conclusion Q: How much chemistry <insert your interest here> do you need to know to predict chemistry <insert your interest here>? A: (Probably) Not a lot… Caveat for using CNNs: As long as there is a systematic image representation of your data from which the property to predict can be inferred May 17, 2017 34
  35. 35. @garrettbgoh Conclusion Q: How much chemistry <insert your interest here> do you need to know to predict chemistry <insert your interest here>? A: (Probably) Not a lot… Caveat for using CNNs: As long as there is a systematic image representation of your data from which the property to predict can be inferred May 17, 2017 35 Weather prediction? Traffic prediction?
  36. 36. @garrettbgoh How do we deal with the “small labeled data” problem? Will an “expert chemist” neural network do better? How do we train one? Future Challenges May 17, 2017 36 ?
  37. 37. @garrettbgoh How do we start using “machine intelligence” with human intelligence to tackle previously “unexplainable/unsolvable” problems in science? Future Challenges May 17, 2017 37 “Creativity” “Imagination” “Stamina” “Logical”
  38. 38. @garrettbgoh Acknowledgements Deep Learning for Computational Chemistry Team Funding / Resources May 17, 2017 38 Nathan HodasAbhinav Vishnu Nathan BakerCharles Siegel
  39. 39. Questions? (@garrettbgoh) May 17, 2017 39

×