Machine Learning for Chemistry: Representing and Intervening

1.
Machine Learning forChemistry: Representing and Intervening Ichigaku Takigawa takigawa@icredd.hokudai.ac.jp Apr 26, 2021 @ Hokkaido University Joint Symposium of Engineering & Information Science & WPI-ICReDD

2.
I am agraduate of School of Engineering and IST! 1995-2005 (10 years) Hokkaido Univ School of Engineering Grad School of Engineering Grad School of Info Sci & Tech 2012-2019 (7 years) Hokkaido Univ B.Eng (1999) M.Eng (2001), PhD (2004) Postdoc (2004-2005) Grad School of Info Sci & Tech Tenure Track (2012-2014) Assoc Prof (2014-2019) KUDO Mineichi TANAKA Yuzuru SHIMBO Masaru MINATO Shinichi TANAKA Yuzuru IMAI Hideyuki

3.
2005-2011 (7 years)Kyoto Univ 2019-present (2 years) The “Cross-Appointment System” But when I stepped outside Physically I’m at Kyoto

4.
Things go interdisciplinary… •Bioinformatics Center Institute for Chemical Research • Grad School of Pharmaceutical Sci • Medical-risk Avoidance based on iPS Cells Team • Institute for Chemical Reaction Design and Discovery Assist Prof (2005-2011) 2005-2011 (7 years) Kyoto Univ 2019-present (2 years) The “Cross-Appointment System”

5.
This talk • Whyit is needed? • What are exciting for computer scientists? Machine Learning (ML) for Chemistry

6.
It’s a hottopic in Chemistry

7.
But also inMachine Learning! NeurIPS 2020 ICML 2020 ICLR 2020 • Self-Supervised Graph Transformer on Large-Scale Molecular Data • RetroXpert: Decompose Retrosynthesis Prediction Like A Chemist • Reinforced Molecular Optimization with Neighborhood-Controlled Grammars • Autofocused Oracles for Model-based Design • Barking Up the Right Tree: an Approach to Search over Molecule Synthesis DAGs • On the Equivalence of Molecular Graph Convolution and Molecular Wave Function with Poor Basis Set • CogMol: Target-Speciﬁc and Selective Drug Design for COVID-19 Using Deep Generative Models • A Graph to Graphs Framework for Retrosynthesis Prediction • Hierarchical Generation of Molecular Graphs using Structural Motifs • Learning to Navigate in Synthetically Accessible Chemical Space Using Reinforcement Learning • Reinforcement Learning for Molecular Design Guided by Quantum Mechanics • Multi-Objective Molecule Generation using Interpretable Substructures • Improving Molecular Design by Stochastic Iterative Target Augmentation • A Generative Model for Molecular Distance Geometry • Directional Message Passing for Molecular Graphs • GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation • Augmenting Genetic Algorithms with Deep Neural Networks for Exploring the Chemical Space • A Fair Comparison of Graph Neural Networks for Graph Classiﬁcation

8.
Mixed feelings ofcuriosity, optimism, skepticism?

9.
Inseparably linked toautomation “These illustrate how rapid advancements in hardware automation and machine learning continue to transform the nature of experimentation and modeling.” Automation is the use of technology to perform tasks with reduced human involvement or human labor.

10.
Towards machine autonomyin discovery Organic synthesis in a modular robotic system. Science 363 (2019) A mobile robotic chemist. Nature 583 (2020) Automating drug discovery. Nature Reviews Drug Discovery 17 (2018) Automation has been impactfully changing our daily life, society, as well as scientiﬁc experiments and computations.

11.
This talk • Whyit is needed? • What are exciting for computer scientists? I’ll brieﬂy cover these from two aspects: 2. (Experimental) Intervention Machine Learning (ML) for Chemistry • What are good ML-readable representations for chemistry? • What information should be recorded and given to ML? 1. Representation • What are essential to make real chemical discoveries? • Any principled ways for data acquisition and experimental design?

12.
Two pillars forscientific discovery? In essence, ML for chemistry is metascience (the science on how to do science) unexpectedly hitting age-old unsolved questions in the philosophy of natural science.

13.
Machine Learning (ML) https://www.forbes.com/sites/forbestechcouncil/2020/02/19/ in-praise-of-boring-ai-a-k-a-machine-learning/ … “Let’sface it: So far, the artificial intelligence plastered all over PowerPoint slides hasn’t lived up to its hype.” The AI frenzy: hope & hype

14.
Machine Learning (ML) FromAAAI-20 Oxford-Style Debate https://www.forbes.com/sites/forbestechcouncil/2020/02/19/ in-praise-of-boring-ai-a-k-a-machine-learning/ … “Let’s face it: So far, the artificial intelligence plastered all over PowerPoint slides hasn’t lived up to its hype.” The AI frenzy: hope & hype

15.
Machine Learning (ML) Allabout statistical and algorithmic techniques for surface-model ﬁtting to data points by adjusting model parameters. Random Forest Neural Networks SVR Kernel Ridge “Predictive Modeling” Fitted surface used for making predictions on unseen data points Variable 1 Variable 2 <latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit> x1 <latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit> x2 <latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit> x1 <latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit> x2

16.
Modern aspects ofML 1. High dimensionality: Data can have many input variables. a 100x100 pixel grayscale image = 10000 input variables (a 10000-dimensional array)

17.
Modern aspects ofML 1. High dimensionality: Data can have many input variables. a 100x100 pixel grayscale image = 10000 input variables (a 10000-dimensional array) 2. Multiformity and multimodality: Data take many forms + modes Numerical values, discrete structures, networks, variable-length sequences, etc. Images, volumes, videos, audios, texts, point clouds, geometries, sensor signals, etc.

18.
Modern aspects ofML 1. High dimensionality: Data can have many input variables. a 100x100 pixel grayscale image = 10000 input variables (a 10000-dimensional array) 3. Overrepresentation: ML models can have many parameters. ResNet50: 26 million params ResNet101: 45 million params EﬃcientNet-B7: 66 million params VGG19: 144 million params 12-layer, 12-heads BERT: 110 million params 24-layer, 16-heads BERT: 336 million params GPT-2 XL: 1558 million params GPT-3: 175 billion params 2. Multiformity and multimodality: Data take many forms + modes Numerical values, discrete structures, networks, variable-length sequences, etc. Images, volumes, videos, audios, texts, point clouds, geometries, sensor signals, etc.

19.
Modern aspects ofML 1. High dimensionality: Data can have many input variables. a 100x100 pixel grayscale image = 10000 input variables (a 10000-dimensional array) 3. Overrepresentation: ML models can have many parameters. ResNet50: 26 million params ResNet101: 45 million params EﬃcientNet-B7: 66 million params VGG19: 144 million params 12-layer, 12-heads BERT: 110 million params 24-layer, 16-heads BERT: 336 million params GPT-2 XL: 1558 million params GPT-3: 175 billion params Can you imagine what would happen if we try to ﬁt a surface model having 175 billion parameters to 100 million data points in 10 thousand dimension?? 2. Multiformity and multimodality: Data take many forms + modes Numerical values, discrete structures, networks, variable-length sequences, etc. Images, volumes, videos, audios, texts, point clouds, geometries, sensor signals, etc.

20.
Modern aspects ofML 4. Representation learning: Models can have “feature learning” blocks, and they can be “pre-trained” by different large datasets. Prediction Input variables Surface model Classiﬁer or Regressor <latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit> x1 <latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit> x2 <latexit sha1_base64="lFhRrRrVTrFR31ebbMgRp5myJpc=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDPjCuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmNdjzA2PjE55Z32zczOzfsDC4s5S2+YMs/KuqqbBUm0uKpoPGsrtsoLhsnFuqTyvFTb7+/nm9y0FF07tFsGL9XFqqZUFFm0icqcljfKgRCLMCeCwyDqghDcSOqBRxzhGDpkNFAHhwabsAoRFn1FRMFgEFdCmziTkOLsc5zDR9oGZXHKEImt0b9Kq6LLarTu17QctUynqDRMUgYRZi/snvXYM3tgr+zzz1ptp0bfS4tmaaDlRtl/sZz5+FdVp9nGybdqpGcbFew4XhXybjhM/xbyQN886/Qyu+lwe43dsjfyf8O67IluoDXf5bsUT1+P8CORF3oxalD0dzuGQS4WiW5HYqnNUGLPbZUXK1jFOvUjjgQOkESW6ldxiSt0BK8QEbaE+CBV8LiaJfwIIfEFWE6QnA==</latexit> x3 <latexit sha1_base64="0IPXcU0UIDvzZlYURjV2A/THv9U=">AAACiXichVG7SgNBFD2ur/hM1EawEYNiFWZFNKQKprGMj0TBBNndTHR0X+xOFmLwB6zsRK0ULMQP8ANs/AELP0EsFWwsvNksiAbjXWbnzJl77pyZq7um8CVjz11Kd09vX39sYHBoeGQ0nhgbL/pOzTN4wXBMx9vWNZ+bwuYFKaTJt12Pa5Zu8i39MNfc3wq45wvH3pR1l5ctbc8WVWFokqhiKag40t9NJFmKhTHdDtQIJBFF3knco4QKHBiowQKHDUnYhAafvh2oYHCJK6NBnEdIhPscxxgkbY2yOGVoxB7Sf49WOxFr07pZ0w/VBp1i0vBIOY1Z9sRu2Rt7ZHfshX3+WasR1mh6qdOst7Tc3Y2fTG58/KuyaJbY/1Z19CxRRTr0Ksi7GzLNWxgtfXB09raRWZ9tzLFr9kr+r9gze6Ab2MG7cbPG1y87+NHJC70YNUj93Y52UFxIqUuphbXFZHYlalUMU5jBPPVjGVmsIo8C1T/AKc5xoQwpqpJWMq1UpSvSTOBHKLkvAi+SPA==</latexit> . . .

21.
Modern aspects ofML Prediction Input variables Surface model <latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit> x1 <latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit> x2 <latexit sha1_base64="lFhRrRrVTrFR31ebbMgRp5myJpc=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDPjCuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmNdjzA2PjE55Z32zczOzfsDC4s5S2+YMs/KuqqbBUm0uKpoPGsrtsoLhsnFuqTyvFTb7+/nm9y0FF07tFsGL9XFqqZUFFm0icqcljfKgRCLMCeCwyDqghDcSOqBRxzhGDpkNFAHhwabsAoRFn1FRMFgEFdCmziTkOLsc5zDR9oGZXHKEImt0b9Kq6LLarTu17QctUynqDRMUgYRZi/snvXYM3tgr+zzz1ptp0bfS4tmaaDlRtl/sZz5+FdVp9nGybdqpGcbFew4XhXybjhM/xbyQN886/Qyu+lwe43dsjfyf8O67IluoDXf5bsUT1+P8CORF3oxalD0dzuGQS4WiW5HYqnNUGLPbZUXK1jFOvUjjgQOkESW6ldxiSt0BK8QEbaE+CBV8LiaJfwIIfEFWE6QnA==</latexit> x3 <latexit sha1_base64="0IPXcU0UIDvzZlYURjV2A/THv9U=">AAACiXichVG7SgNBFD2ur/hM1EawEYNiFWZFNKQKprGMj0TBBNndTHR0X+xOFmLwB6zsRK0ULMQP8ANs/AELP0EsFWwsvNksiAbjXWbnzJl77pyZq7um8CVjz11Kd09vX39sYHBoeGQ0nhgbL/pOzTN4wXBMx9vWNZ+bwuYFKaTJt12Pa5Zu8i39MNfc3wq45wvH3pR1l5ctbc8WVWFokqhiKag40t9NJFmKhTHdDtQIJBFF3knco4QKHBiowQKHDUnYhAafvh2oYHCJK6NBnEdIhPscxxgkbY2yOGVoxB7Sf49WOxFr07pZ0w/VBp1i0vBIOY1Z9sRu2Rt7ZHfshX3+WasR1mh6qdOst7Tc3Y2fTG58/KuyaJbY/1Z19CxRRTr0Ksi7GzLNWxgtfXB09raRWZ9tzLFr9kr+r9gze6Ab2MG7cbPG1y87+NHJC70YNUj93Y52UFxIqUuphbXFZHYlalUMU5jBPPVjGVmsIo8C1T/AKc5xoQwpqpJWMq1UpSvSTOBHKLkvAi+SPA==</latexit> . . . Latent variables Variable transformation Feature learning Classiﬁer or Regressor 4. Representation learning: Models can have “feature learning” blocks, and they can be “pre-trained” by different large datasets.

22.

23.

24.

25.

26.

27.
Modern aspects ofML Prediction Input variables Surface model <latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit> x1 <latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit> x2 <latexit sha1_base64="lFhRrRrVTrFR31ebbMgRp5myJpc=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDPjCuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmNdjzA2PjE55Z32zczOzfsDC4s5S2+YMs/KuqqbBUm0uKpoPGsrtsoLhsnFuqTyvFTb7+/nm9y0FF07tFsGL9XFqqZUFFm0icqcljfKgRCLMCeCwyDqghDcSOqBRxzhGDpkNFAHhwabsAoRFn1FRMFgEFdCmziTkOLsc5zDR9oGZXHKEImt0b9Kq6LLarTu17QctUynqDRMUgYRZi/snvXYM3tgr+zzz1ptp0bfS4tmaaDlRtl/sZz5+FdVp9nGybdqpGcbFew4XhXybjhM/xbyQN886/Qyu+lwe43dsjfyf8O67IluoDXf5bsUT1+P8CORF3oxalD0dzuGQS4WiW5HYqnNUGLPbZUXK1jFOvUjjgQOkESW6ldxiSt0BK8QEbaE+CBV8LiaJfwIIfEFWE6QnA==</latexit> x3 <latexit sha1_base64="0IPXcU0UIDvzZlYURjV2A/THv9U=">AAACiXichVG7SgNBFD2ur/hM1EawEYNiFWZFNKQKprGMj0TBBNndTHR0X+xOFmLwB6zsRK0ULMQP8ANs/AELP0EsFWwsvNksiAbjXWbnzJl77pyZq7um8CVjz11Kd09vX39sYHBoeGQ0nhgbL/pOzTN4wXBMx9vWNZ+bwuYFKaTJt12Pa5Zu8i39MNfc3wq45wvH3pR1l5ctbc8WVWFokqhiKag40t9NJFmKhTHdDtQIJBFF3knco4QKHBiowQKHDUnYhAafvh2oYHCJK6NBnEdIhPscxxgkbY2yOGVoxB7Sf49WOxFr07pZ0w/VBp1i0vBIOY1Z9sRu2Rt7ZHfshX3+WasR1mh6qdOst7Tc3Y2fTG58/KuyaJbY/1Z19CxRRTr0Ksi7GzLNWxgtfXB09raRWZ9tzLFr9kr+r9gze6Ab2MG7cbPG1y87+NHJC70YNUj93Y52UFxIqUuphbXFZHYlalUMU5jBPPVjGVmsIo8C1T/AKc5xoQwpqpJWMq1UpSvSTOBHKLkvAi+SPA==</latexit> . . . Latent variables Variable transformation Feature learning Classiﬁer or Regressor Linear 4. Representation learning: Models can have “feature learning” blocks, and they can be “pre-trained” by different large datasets.

28.
Prior Info Observational data Reportedfacts Textbook knowledge Needs and excitement around ML for Chemistry Discovery Representation Model (Belief) Intervention Hypothesis New Info Prior Info • Identify relevant variables • Set design choices • Set experiments • Interpret results Model (Belief) Hypothesis Can we somehow externalize “experience and intuition” of experienced chemists to rationalize and accelerate discoveries?

29.
Prior Info Observational data Reportedfacts Textbook knowledge Needs and excitement around ML for Chemistry Discovery Representation Model (Belief) Intervention Hypothesis New Info Prior Info • Identify relevant variables • Set design choices • Set experiments • Interpret results Model (Belief) Hypothesis Can we somehow externalize “experience and intuition” of experienced chemists to rationalize and accelerate discoveries?

30.
Representation Reactions Materials Molecules ML computer programs • Observational data •Reported facts • Textbook knowledge ? Identifying relevant factors and establishing any necessary and suﬃcient computer-readable representations are inevitable preconditions, but this is far from trivial and quite paradoxical since we haven’t understood the target. Any rationalized “real” discovery only comes from understanding and discovery of the causal relations between relevant factors.

31.
Representation <latexit sha1_base64="dwtAUUE0cfsFu6+2FLg7b109CNE=">AAACi3ichVG7SgNBFL1ZX/ERjdoINsGgWIW7a0iiWIgiWKoxMaASdtdJMmRf7E4CMfgDljYW2ihYiB/gB9j4AxZ+glhGsLHw7mZFLIx3mZ07Z+65c2aO5hjcE4gvEamvf2BwKDo8MjoWG5+IT04VPbvh6qyg24btljTVYwa3WEFwYbCS4zLV1Ay2r9U3/P39JnM9blt7ouWwI1OtWrzCdVUQVDoUNSbUMi/Hk5hazmWUdCaBKcSsrMh+omTTS+mETIgfSQhj244/wCEcgw06NMAEBhYIyg1QwaPvAGRAcAg7gjZhLmU82GdwCiPEbVAVowqV0Dr9q7Q6CFGL1n5PL2DrdIpBwyVmAubxGe+wg094j6/4+WevdtDD19KiWetymVOeOJvJf/zLMmkWUPth9dQsoAK5QCsn7U6A+LfQu/zmyUUnv7I7317AG3wj/df4go90A6v5rt/usN3LHno00kIvRgZ9u5D4OykqKTmTUnbSybX10KoozMIcLJIfWViDLdiGQuDDOVzClRSTlqQVabVbKkVCzjT8CmnzC0ydk0A=</latexit> ✓i <latexit sha1_base64="tkPRNIYeS8tNgbH62CO/ULi3LDw=">AAACi3ichVHLSsNAFL2Nr/quuhHcBIviqtykoa3iQhTBZbXWFtpSkjjaaF4k04IWf8ClGxe6UXAhfoAf4MYfcOEniMsKblx4k0bEhXrDZO6cuefOmTmaaxo+R3yOCT29ff0D8cGh4ZHRsfHExOSO7zQ9nRV1x3S8sqb6zDRsVuQGN1nZ9ZhqaSYraYdrwX6pxTzfcOxtfuSymqXu28aeoaucoHKVNxhX6wf1RBJTi7mMrGRETCFmJVkKEjmrpBVRIiSIJESRdxL3UIVdcECHJljAwAZOuQkq+PRVQAIEl7AatAnzKDPCfQYnMETcJlUxqlAJPaT/Pq0qEWrTOujph2ydTjFpeMQUYQ6f8BY7+Ih3+IIfv/Zqhz0CLUc0a10uc+vjp9OF939ZFs0cGt+sPzVz2INcqNUg7W6IBLfQu/zW8XmnsLQ1157Ha3wl/Vf4jA90A7v1pt9ssq2LP/RopIVejAz6ckH8PdmRU1ImJW8qyZXVyKo4zMAsLJAfWViBDchDMfThDC7gUhgV0sKSsNwtFWIRZwp+hLD+CU69k0E=</latexit> ✓j O N N N H NH N N N CH3 CH3 Levelsof Theory/Model Abstraction First Principle and Simulation (Quantum Chemistry) Spatio-Temporal Flexibility, Variations, Dynamics, and Interactions

32.
Representation Latent variables Representation learning Reactions Materials Molecules Graphs (of differentsize) Node features Edge features CC1CCNO1 Graph Neural Networks (GNNs) NCc1ccoc1.S=(Cl)Cl>>[RX_5]S=C=NCc1ccoc1 … Classiﬁer or Regressor Diverse Downstream Tasks Modular Hierarchy Amide Proline Oxazoline Compositionality Phenyl Carboxyl Methyl Ethyl Tert-butyl Isoprophyl Trifluoromethyl Benzyl Substituents Graph  Coarsening Combinatorial aspects

33.
Representation NB: Transformers canbe considered as a special case of GNNs, and many Transformer-type GNNs are also developed. Transformer Core (Multihead) Self-attention Feed-forward NN Add + LayerNorm Add + LayerNorm <latexit sha1_base64="I4mbdBylFC3Uuk1C7RrdvvfeVHQ=">AAACqXichVFNS9xQFD2m9dvqqJtCN8GpogjDy1CqKIXBbrp01NFBI+ElvnEeky+SN0N16B+YP9CFKwUX4qa70m676R9w4U8Qlxa66cKbTEBUqjck97zz7rk57107dGWsGLvs0V687O3rHxgcGh55NTqWG5/YjINm5IiKE7hBVLV5LFzpi4qSyhXVMBLcs12xZTc+JvtbLRHFMvA31EEodj2+78uadLgiysq9DfQPuhk3PUvqJnfDOrfk7Oc5vZakZVPVheJzVi7PCiwN/TEwMpBHFqtB7jtM7CGAgyY8CPhQhF1wxPTswABDSNwu2sRFhGS6L/AFQ6RtUpWgCk5sg777tNrJWJ/WSc84VTv0F5feiJQ6ptkFO2M37Dc7Z1fs3397tdMeiZcDynZXK0JrrPN6/e+zKo+yQv1O9aRnhRoWU6+SvIcpk5zC6epbh19v1pfWptsz7IRdk/9jdsl+0Qn81h/ntCzWjp7wY5MXujEakPFwHI/BZrFgvC8Uy+/ypZVsVAN4gynM0jwWUMInrKJC/Tv4hh/4qc1rZa2qbXdLtZ5MM4l7oTm3XZydSQ==</latexit> o = X i ↵i(x)fi(x; ✓) Effective pretraining is a crucial open problem because in practice, we can only access to limited data for each speciﬁc problem. Pretraining with self-supervised pretext tasks have transformed NLP

34.
Prior Info Observational data Reportedfacts Textbook knowledge Needs and excitement around ML for Chemistry Discovery Representation Model (Belief) Intervention Hypothesis New Info Prior Info • Identify relevant variables • Set design choices • Set experiments • Interpret results Model (Belief) Hypothesis Can we somehow externalize “experience and intuition” of experienced chemists to rationalize and accelerate discoveries? New Info

35.
Prior Info Observational data Reportedfacts Textbook knowledge Needs and excitement around ML for Chemistry Discovery Representation Model (Belief) Intervention Hypothesis New Info Prior Info • Identify relevant variables • Set design choices • Set experiments • Interpret results Model (Belief) Hypothesis Can we somehow externalize “experience and intuition” of experienced chemists to rationalize and accelerate discoveries? New Info

36.
(Experimental) Intervention New Info Hypothesis ? Automation Reactions Materials Molecules Anyrationalized “real” discovery only comes from understanding and discovery of the causal relations between relevant factors. Information about causal relations can be acquired by passive observation and active intervention. Correlation does not imply causation. ML computer programs • Observational data • Reported facts • Textbook knowledge

37.
(Experimental) Intervention We needto carefully rethink how an experiment should be performed to be informative about causal structure of targets.

38.
(Experimental) Intervention We needto carefully rethink how an experiment should be performed to be informative about causal structure of targets. • Correlation vs Causation ML models trained over passive observational data can be trapped by spurious correlations between variables, being totally ignorant of the underlying causality.

39.
(Experimental) Intervention We needto carefully rethink how an experiment should be performed to be informative about causal structure of targets. • Correlation vs Causation ML models trained over passive observational data can be trapped by spurious correlations between variables, being totally ignorant of the underlying causality. • Garbage In, Garbage Out (GIGO) ML models are just representative of the given data. If it has any bias, ML predictions can be miserably misleading.

40.
(Experimental) Intervention We needto carefully rethink how an experiment should be performed to be informative about causal structure of targets. • Correlation vs Causation ML models trained over passive observational data can be trapped by spurious correlations between variables, being totally ignorant of the underlying causality. • Garbage In, Garbage Out (GIGO) ML models are just representative of the given data. If it has any bias, ML predictions can be miserably misleading. • Unavoidable Human-Caused Biases Always remember that “most chemical experiments are planned by human scientists and therefore are subject to a variety of human cognitive biases, heuristics and social inﬂuences.” * Jia, X., Lynch, A., Huang, Y. et al. Anthropogenic biases in chemical reaction data hinder exploratory inorganic synthesis. Nature 573, 251–255 (2019).

41.
https://www.chemistryworld.com/news/dispute-over-reaction-prediction-puts-machine-learnings- pitfalls-in-spotlight/3009912.article • Main paperhttps://doi.org/10.1126/science.aar5169 • Erratum https://doi.org/10.1126/science.aat7648 • Negative comment paper https://doi.org/10.1126/science.aat8603 • Author's response https://doi.org/10.1126/science.aat8763 (Experimental) Intervention

42.
Keys: fusing modernML with ﬁrst-principles, simulations, domain knowledge, and collaboratively working with experimental experts. Current ML is too data-hungry and vulnerable to any data bias, but acquisition of clean representative data is often quite impractical. (Experimental) Intervention • Deep learning techniques thus far have proven to be data hungry, shallow, brittle, and limited in their ability to generalize (Marcus, 2018) • Current machine learning techniques are data-hungry and brittle—they can only make sense of patterns they've seen before. (Chollet, 2020) • A growing body of evidence shows that state-of-the-art models learn to exploit spurious statistical patterns in datasets... instead of learning meaning in the ﬂexible and generalizable way that humans do. (Nie et al., 2019) • Current machine learning methods seem weak when they are required to generalize beyond the training distribution, which is what is often needed in practice. (Bengio et al., 2019)

43.
(Experimental) Intervention AlphaGo (Nature, 2016) AlphaGoZero (Nature, 2017) AlphaZero (Science, 2018) MuZero (Nature, 2020) This has reignited the old war between induction and deduction, and we’re re-encountering the long-standing problems in AI. • Knowledge acquisition / Principled data acquisition Experimental design, Model-based optimization, Evolutionary computation • Reconciliation between inductive and deductive ML Hybrid models of causal/logical/algorithmic ML and deep learning • Balancing exploitation and exploration Model-based reinforcement learning or search in a combinatorial space

44.
ML for Chemistryto me (a ML researcher) An exciting “real” test bench for the long-standing unsolved but attractive fundamental problems in “AI for automating discovery”, involving many fascinating technical topics of modern ML. Prior Info Observational data Reported facts Textbook knowledge Discovery Representation Model (Belief) Intervention Hypothesis New Info Prior Info • Identify relevant variables • Set design choices • Set experiments • Interpret results Model (Belief) Hypothesis

45.
Summary • Why itis needed? • What are exciting for computer scientists? Two aspects: 2. (Experimental) Intervention Machine Learning (ML) for Chemistry • What are good ML-readable representations for chemistry? • What information should be recorded and given to ML? 1. Representation • What are essential to make real chemical discoveries? • Any principled ways for data acquisition and experimental design?

Machine Learning for Chemistry: Representing and Intervening

More Related Content

What's hot

Similar to Machine Learning for Chemistry: Representing and Intervening

More from Ichigaku Takigawa

Recently uploaded

Machine Learning for Chemistry: Representing and Intervening