Machine Learning for Chemistry:
Representing and Intervening
Ichigaku Takigawa
takigawa@icredd.hokudai.ac.jp
Apr 26, 2021 @ Hokkaido University
Joint Symposium of Engineering & Information Science & WPI-ICReDD
I am a graduate of School of Engineering and IST!
1995-2005 (10 years) Hokkaido Univ
School of Engineering
Grad School of Engineering
Grad School of Info Sci & Tech
2012-2019 (7 years) Hokkaido Univ
B.Eng (1999)
M.Eng (2001), PhD (2004)
Postdoc (2004-2005)
Grad School of Info Sci & Tech Tenure Track (2012-2014)
Assoc Prof (2014-2019)
KUDO Mineichi TANAKA Yuzuru
SHIMBO Masaru
MINATO Shinichi
TANAKA Yuzuru
IMAI Hideyuki
2005-2011 (7 years) Kyoto Univ
2019-present (2 years) The “Cross-Appointment System”
But when I stepped outside
Physically I’m at Kyoto
Things go interdisciplinary…
• Bioinformatics Center
Institute for Chemical Research
• Grad School of Pharmaceutical Sci
• Medical-risk Avoidance
based on iPS Cells Team
• Institute for Chemical Reaction
Design and Discovery
Assist Prof (2005-2011)
2005-2011 (7 years) Kyoto Univ
2019-present (2 years) The “Cross-Appointment System”
This talk
• Why it is needed?
• What are exciting for computer scientists?
Machine Learning (ML) for Chemistry
It’s a hot topic in Chemistry
But also in Machine Learning!
NeurIPS 2020 ICML 2020
ICLR 2020
• Self-Supervised Graph Transformer on Large-Scale
Molecular Data
• RetroXpert: Decompose Retrosynthesis Prediction
Like A Chemist
• Reinforced Molecular Optimization with
Neighborhood-Controlled Grammars
• Autofocused Oracles for Model-based Design
• Barking Up the Right Tree: an Approach to Search
over Molecule Synthesis DAGs
• On the Equivalence of Molecular Graph Convolution
and Molecular Wave Function with Poor Basis Set
• CogMol: Target-Specific and Selective Drug Design
for COVID-19 Using Deep Generative Models
• A Graph to Graphs Framework for Retrosynthesis
Prediction
• Hierarchical Generation of Molecular Graphs using
Structural Motifs
• Learning to Navigate in Synthetically Accessible
Chemical Space Using Reinforcement Learning
• Reinforcement Learning for Molecular Design Guided by
Quantum Mechanics
• Multi-Objective Molecule Generation using Interpretable
Substructures
• Improving Molecular Design by Stochastic Iterative
Target Augmentation
• A Generative Model for Molecular Distance Geometry
• Directional Message Passing for Molecular Graphs
• GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation
• Augmenting Genetic Algorithms with Deep Neural Networks for Exploring the
Chemical Space
• A Fair Comparison of Graph Neural Networks for Graph Classification
Mixed feelings of curiosity, optimism, skepticism?
Inseparably linked to automation
“These illustrate how rapid advancements in hardware automation and machine
learning continue to transform the nature of experimentation and modeling.”
Automation is the use of technology to perform tasks with reduced
human involvement or human labor.
Towards machine autonomy in discovery
Organic synthesis in a modular robotic system. Science 363 (2019) A mobile robotic chemist. Nature 583 (2020)
Automating drug discovery. Nature Reviews Drug Discovery 17 (2018)
Automation has been impactfully changing our daily life, society,
as well as scientific experiments and computations.
This talk
• Why it is needed?
• What are exciting for computer scientists?
I’ll briefly cover these from two aspects:
2. (Experimental) Intervention
Machine Learning (ML) for Chemistry
• What are good ML-readable representations for chemistry?
• What information should be recorded and given to ML?
1. Representation
• What are essential to make real chemical discoveries?
• Any principled ways for data acquisition and experimental design?
Two pillars for scientific discovery?
In essence, ML for chemistry is metascience (the science on how
to do science) unexpectedly hitting age-old unsolved questions in
the philosophy of natural science.
Machine Learning (ML)
https://www.forbes.com/sites/forbestechcouncil/2020/02/19/
in-praise-of-boring-ai-a-k-a-machine-learning/
…
“Let’s face it:
So far, the artificial
intelligence plastered all
over PowerPoint slides
hasn’t lived up to its hype.”
The AI frenzy: hope & hype
Machine Learning (ML)
From AAAI-20 Oxford-Style Debate
https://www.forbes.com/sites/forbestechcouncil/2020/02/19/
in-praise-of-boring-ai-a-k-a-machine-learning/
…
“Let’s face it:
So far, the artificial
intelligence plastered all
over PowerPoint slides
hasn’t lived up to its hype.”
The AI frenzy: hope & hype
Machine Learning (ML)
All about statistical and algorithmic techniques for
surface-model fitting to data points by adjusting model parameters.
Random Forest Neural Networks
SVR Kernel Ridge
“Predictive Modeling”
Fitted surface used for
making predictions on
unseen data points
Variable 1
Variable 2
<latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit>
x1
<latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit>
x2
<latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit>
x1
<latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit>
x2
Modern aspects of ML
1. High dimensionality: Data can have many input variables.
a 100x100 pixel grayscale image = 10000 input variables (a 10000-dimensional array)
Modern aspects of ML
1. High dimensionality: Data can have many input variables.
a 100x100 pixel grayscale image = 10000 input variables (a 10000-dimensional array)
2. Multiformity and multimodality: Data take many forms + modes
Numerical values, discrete structures, networks, variable-length sequences, etc.
Images, volumes, videos, audios, texts, point clouds, geometries, sensor signals, etc.
Modern aspects of ML
1. High dimensionality: Data can have many input variables.
a 100x100 pixel grayscale image = 10000 input variables (a 10000-dimensional array)
3. Overrepresentation: ML models can have many parameters.
ResNet50: 26 million params
ResNet101: 45 million params
EfficientNet-B7: 66 million params
VGG19: 144 million params
12-layer, 12-heads BERT: 110 million params
24-layer, 16-heads BERT: 336 million params
GPT-2 XL: 1558 million params
GPT-3: 175 billion params
2. Multiformity and multimodality: Data take many forms + modes
Numerical values, discrete structures, networks, variable-length sequences, etc.
Images, volumes, videos, audios, texts, point clouds, geometries, sensor signals, etc.
Modern aspects of ML
1. High dimensionality: Data can have many input variables.
a 100x100 pixel grayscale image = 10000 input variables (a 10000-dimensional array)
3. Overrepresentation: ML models can have many parameters.
ResNet50: 26 million params
ResNet101: 45 million params
EfficientNet-B7: 66 million params
VGG19: 144 million params
12-layer, 12-heads BERT: 110 million params
24-layer, 16-heads BERT: 336 million params
GPT-2 XL: 1558 million params
GPT-3: 175 billion params
Can you imagine what would happen if we try to
fit a surface model having 175 billion parameters
to 100 million data points in 10 thousand dimension??
2. Multiformity and multimodality: Data take many forms + modes
Numerical values, discrete structures, networks, variable-length sequences, etc.
Images, volumes, videos, audios, texts, point clouds, geometries, sensor signals, etc.
Modern aspects of ML
4. Representation learning: Models can have “feature learning”
blocks, and they can be “pre-trained” by different large datasets.
Prediction
Input
variables
Surface
model
Classifier or
Regressor
<latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit>
x1
<latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit>
x2
<latexit sha1_base64="lFhRrRrVTrFR31ebbMgRp5myJpc=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDPjCuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmNdjzA2PjE55Z32zczOzfsDC4s5S2+YMs/KuqqbBUm0uKpoPGsrtsoLhsnFuqTyvFTb7+/nm9y0FF07tFsGL9XFqqZUFFm0icqcljfKgRCLMCeCwyDqghDcSOqBRxzhGDpkNFAHhwabsAoRFn1FRMFgEFdCmziTkOLsc5zDR9oGZXHKEImt0b9Kq6LLarTu17QctUynqDRMUgYRZi/snvXYM3tgr+zzz1ptp0bfS4tmaaDlRtl/sZz5+FdVp9nGybdqpGcbFew4XhXybjhM/xbyQN886/Qyu+lwe43dsjfyf8O67IluoDXf5bsUT1+P8CORF3oxalD0dzuGQS4WiW5HYqnNUGLPbZUXK1jFOvUjjgQOkESW6ldxiSt0BK8QEbaE+CBV8LiaJfwIIfEFWE6QnA==</latexit>
x3
<latexit sha1_base64="0IPXcU0UIDvzZlYURjV2A/THv9U=">AAACiXichVG7SgNBFD2ur/hM1EawEYNiFWZFNKQKprGMj0TBBNndTHR0X+xOFmLwB6zsRK0ULMQP8ANs/AELP0EsFWwsvNksiAbjXWbnzJl77pyZq7um8CVjz11Kd09vX39sYHBoeGQ0nhgbL/pOzTN4wXBMx9vWNZ+bwuYFKaTJt12Pa5Zu8i39MNfc3wq45wvH3pR1l5ctbc8WVWFokqhiKag40t9NJFmKhTHdDtQIJBFF3knco4QKHBiowQKHDUnYhAafvh2oYHCJK6NBnEdIhPscxxgkbY2yOGVoxB7Sf49WOxFr07pZ0w/VBp1i0vBIOY1Z9sRu2Rt7ZHfshX3+WasR1mh6qdOst7Tc3Y2fTG58/KuyaJbY/1Z19CxRRTr0Ksi7GzLNWxgtfXB09raRWZ9tzLFr9kr+r9gze6Ab2MG7cbPG1y87+NHJC70YNUj93Y52UFxIqUuphbXFZHYlalUMU5jBPPVjGVmsIo8C1T/AKc5xoQwpqpJWMq1UpSvSTOBHKLkvAi+SPA==</latexit>
.
.
.
Modern aspects of ML
Prediction
Input
variables
Surface
model
<latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit>
x1
<latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit>
x2
<latexit sha1_base64="lFhRrRrVTrFR31ebbMgRp5myJpc=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDPjCuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmNdjzA2PjE55Z32zczOzfsDC4s5S2+YMs/KuqqbBUm0uKpoPGsrtsoLhsnFuqTyvFTb7+/nm9y0FF07tFsGL9XFqqZUFFm0icqcljfKgRCLMCeCwyDqghDcSOqBRxzhGDpkNFAHhwabsAoRFn1FRMFgEFdCmziTkOLsc5zDR9oGZXHKEImt0b9Kq6LLarTu17QctUynqDRMUgYRZi/snvXYM3tgr+zzz1ptp0bfS4tmaaDlRtl/sZz5+FdVp9nGybdqpGcbFew4XhXybjhM/xbyQN886/Qyu+lwe43dsjfyf8O67IluoDXf5bsUT1+P8CORF3oxalD0dzuGQS4WiW5HYqnNUGLPbZUXK1jFOvUjjgQOkESW6ldxiSt0BK8QEbaE+CBV8LiaJfwIIfEFWE6QnA==</latexit>
x3
<latexit sha1_base64="0IPXcU0UIDvzZlYURjV2A/THv9U=">AAACiXichVG7SgNBFD2ur/hM1EawEYNiFWZFNKQKprGMj0TBBNndTHR0X+xOFmLwB6zsRK0ULMQP8ANs/AELP0EsFWwsvNksiAbjXWbnzJl77pyZq7um8CVjz11Kd09vX39sYHBoeGQ0nhgbL/pOzTN4wXBMx9vWNZ+bwuYFKaTJt12Pa5Zu8i39MNfc3wq45wvH3pR1l5ctbc8WVWFokqhiKag40t9NJFmKhTHdDtQIJBFF3knco4QKHBiowQKHDUnYhAafvh2oYHCJK6NBnEdIhPscxxgkbY2yOGVoxB7Sf49WOxFr07pZ0w/VBp1i0vBIOY1Z9sRu2Rt7ZHfshX3+WasR1mh6qdOst7Tc3Y2fTG58/KuyaJbY/1Z19CxRRTr0Ksi7GzLNWxgtfXB09raRWZ9tzLFr9kr+r9gze6Ab2MG7cbPG1y87+NHJC70YNUj93Y52UFxIqUuphbXFZHYlalUMU5jBPPVjGVmsIo8C1T/AKc5xoQwpqpJWMq1UpSvSTOBHKLkvAi+SPA==</latexit>
.
.
.
Latent
variables
Variable
transformation
Feature learning
Classifier or
Regressor
4. Representation learning: Models can have “feature learning”
blocks, and they can be “pre-trained” by different large datasets.
Modern aspects of ML
Prediction
Input
variables
Surface
model
<latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit>
x1
<latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit>
x2
<latexit sha1_base64="lFhRrRrVTrFR31ebbMgRp5myJpc=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDPjCuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmNdjzA2PjE55Z32zczOzfsDC4s5S2+YMs/KuqqbBUm0uKpoPGsrtsoLhsnFuqTyvFTb7+/nm9y0FF07tFsGL9XFqqZUFFm0icqcljfKgRCLMCeCwyDqghDcSOqBRxzhGDpkNFAHhwabsAoRFn1FRMFgEFdCmziTkOLsc5zDR9oGZXHKEImt0b9Kq6LLarTu17QctUynqDRMUgYRZi/snvXYM3tgr+zzz1ptp0bfS4tmaaDlRtl/sZz5+FdVp9nGybdqpGcbFew4XhXybjhM/xbyQN886/Qyu+lwe43dsjfyf8O67IluoDXf5bsUT1+P8CORF3oxalD0dzuGQS4WiW5HYqnNUGLPbZUXK1jFOvUjjgQOkESW6ldxiSt0BK8QEbaE+CBV8LiaJfwIIfEFWE6QnA==</latexit>
x3
<latexit sha1_base64="0IPXcU0UIDvzZlYURjV2A/THv9U=">AAACiXichVG7SgNBFD2ur/hM1EawEYNiFWZFNKQKprGMj0TBBNndTHR0X+xOFmLwB6zsRK0ULMQP8ANs/AELP0EsFWwsvNksiAbjXWbnzJl77pyZq7um8CVjz11Kd09vX39sYHBoeGQ0nhgbL/pOzTN4wXBMx9vWNZ+bwuYFKaTJt12Pa5Zu8i39MNfc3wq45wvH3pR1l5ctbc8WVWFokqhiKag40t9NJFmKhTHdDtQIJBFF3knco4QKHBiowQKHDUnYhAafvh2oYHCJK6NBnEdIhPscxxgkbY2yOGVoxB7Sf49WOxFr07pZ0w/VBp1i0vBIOY1Z9sRu2Rt7ZHfshX3+WasR1mh6qdOst7Tc3Y2fTG58/KuyaJbY/1Z19CxRRTr0Ksi7GzLNWxgtfXB09raRWZ9tzLFr9kr+r9gze6Ab2MG7cbPG1y87+NHJC70YNUj93Y52UFxIqUuphbXFZHYlalUMU5jBPPVjGVmsIo8C1T/AKc5xoQwpqpJWMq1UpSvSTOBHKLkvAi+SPA==</latexit>
.
.
.
Latent
variables
Variable
transformation
Feature learning
Classifier or
Regressor
4. Representation learning: Models can have “feature learning”
blocks, and they can be “pre-trained” by different large datasets.
Modern aspects of ML
Prediction
Input
variables
Surface
model
<latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit>
x1
<latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit>
x2
<latexit sha1_base64="lFhRrRrVTrFR31ebbMgRp5myJpc=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDPjCuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmNdjzA2PjE55Z32zczOzfsDC4s5S2+YMs/KuqqbBUm0uKpoPGsrtsoLhsnFuqTyvFTb7+/nm9y0FF07tFsGL9XFqqZUFFm0icqcljfKgRCLMCeCwyDqghDcSOqBRxzhGDpkNFAHhwabsAoRFn1FRMFgEFdCmziTkOLsc5zDR9oGZXHKEImt0b9Kq6LLarTu17QctUynqDRMUgYRZi/snvXYM3tgr+zzz1ptp0bfS4tmaaDlRtl/sZz5+FdVp9nGybdqpGcbFew4XhXybjhM/xbyQN886/Qyu+lwe43dsjfyf8O67IluoDXf5bsUT1+P8CORF3oxalD0dzuGQS4WiW5HYqnNUGLPbZUXK1jFOvUjjgQOkESW6ldxiSt0BK8QEbaE+CBV8LiaJfwIIfEFWE6QnA==</latexit>
x3
<latexit sha1_base64="0IPXcU0UIDvzZlYURjV2A/THv9U=">AAACiXichVG7SgNBFD2ur/hM1EawEYNiFWZFNKQKprGMj0TBBNndTHR0X+xOFmLwB6zsRK0ULMQP8ANs/AELP0EsFWwsvNksiAbjXWbnzJl77pyZq7um8CVjz11Kd09vX39sYHBoeGQ0nhgbL/pOzTN4wXBMx9vWNZ+bwuYFKaTJt12Pa5Zu8i39MNfc3wq45wvH3pR1l5ctbc8WVWFokqhiKag40t9NJFmKhTHdDtQIJBFF3knco4QKHBiowQKHDUnYhAafvh2oYHCJK6NBnEdIhPscxxgkbY2yOGVoxB7Sf49WOxFr07pZ0w/VBp1i0vBIOY1Z9sRu2Rt7ZHfshX3+WasR1mh6qdOst7Tc3Y2fTG58/KuyaJbY/1Z19CxRRTr0Ksi7GzLNWxgtfXB09raRWZ9tzLFr9kr+r9gze6Ab2MG7cbPG1y87+NHJC70YNUj93Y52UFxIqUuphbXFZHYlalUMU5jBPPVjGVmsIo8C1T/AKc5xoQwpqpJWMq1UpSvSTOBHKLkvAi+SPA==</latexit>
.
.
.
Latent
variables
Variable
transformation
Feature learning
Classifier or
Regressor
4. Representation learning: Models can have “feature learning”
blocks, and they can be “pre-trained” by different large datasets.
Modern aspects of ML
Prediction
Input
variables
Surface
model
<latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit>
x1
<latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit>
x2
<latexit sha1_base64="lFhRrRrVTrFR31ebbMgRp5myJpc=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDPjCuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmNdjzA2PjE55Z32zczOzfsDC4s5S2+YMs/KuqqbBUm0uKpoPGsrtsoLhsnFuqTyvFTb7+/nm9y0FF07tFsGL9XFqqZUFFm0icqcljfKgRCLMCeCwyDqghDcSOqBRxzhGDpkNFAHhwabsAoRFn1FRMFgEFdCmziTkOLsc5zDR9oGZXHKEImt0b9Kq6LLarTu17QctUynqDRMUgYRZi/snvXYM3tgr+zzz1ptp0bfS4tmaaDlRtl/sZz5+FdVp9nGybdqpGcbFew4XhXybjhM/xbyQN886/Qyu+lwe43dsjfyf8O67IluoDXf5bsUT1+P8CORF3oxalD0dzuGQS4WiW5HYqnNUGLPbZUXK1jFOvUjjgQOkESW6ldxiSt0BK8QEbaE+CBV8LiaJfwIIfEFWE6QnA==</latexit>
x3
<latexit sha1_base64="0IPXcU0UIDvzZlYURjV2A/THv9U=">AAACiXichVG7SgNBFD2ur/hM1EawEYNiFWZFNKQKprGMj0TBBNndTHR0X+xOFmLwB6zsRK0ULMQP8ANs/AELP0EsFWwsvNksiAbjXWbnzJl77pyZq7um8CVjz11Kd09vX39sYHBoeGQ0nhgbL/pOzTN4wXBMx9vWNZ+bwuYFKaTJt12Pa5Zu8i39MNfc3wq45wvH3pR1l5ctbc8WVWFokqhiKag40t9NJFmKhTHdDtQIJBFF3knco4QKHBiowQKHDUnYhAafvh2oYHCJK6NBnEdIhPscxxgkbY2yOGVoxB7Sf49WOxFr07pZ0w/VBp1i0vBIOY1Z9sRu2Rt7ZHfshX3+WasR1mh6qdOst7Tc3Y2fTG58/KuyaJbY/1Z19CxRRTr0Ksi7GzLNWxgtfXB09raRWZ9tzLFr9kr+r9gze6Ab2MG7cbPG1y87+NHJC70YNUj93Y52UFxIqUuphbXFZHYlalUMU5jBPPVjGVmsIo8C1T/AKc5xoQwpqpJWMq1UpSvSTOBHKLkvAi+SPA==</latexit>
.
.
.
Latent
variables
Variable
transformation
Feature learning
Classifier or
Regressor
4. Representation learning: Models can have “feature learning”
blocks, and they can be “pre-trained” by different large datasets.
Modern aspects of ML
Prediction
Input
variables
Surface
model
<latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit>
x1
<latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit>
x2
<latexit sha1_base64="lFhRrRrVTrFR31ebbMgRp5myJpc=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDPjCuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmNdjzA2PjE55Z32zczOzfsDC4s5S2+YMs/KuqqbBUm0uKpoPGsrtsoLhsnFuqTyvFTb7+/nm9y0FF07tFsGL9XFqqZUFFm0icqcljfKgRCLMCeCwyDqghDcSOqBRxzhGDpkNFAHhwabsAoRFn1FRMFgEFdCmziTkOLsc5zDR9oGZXHKEImt0b9Kq6LLarTu17QctUynqDRMUgYRZi/snvXYM3tgr+zzz1ptp0bfS4tmaaDlRtl/sZz5+FdVp9nGybdqpGcbFew4XhXybjhM/xbyQN886/Qyu+lwe43dsjfyf8O67IluoDXf5bsUT1+P8CORF3oxalD0dzuGQS4WiW5HYqnNUGLPbZUXK1jFOvUjjgQOkESW6ldxiSt0BK8QEbaE+CBV8LiaJfwIIfEFWE6QnA==</latexit>
x3
<latexit sha1_base64="0IPXcU0UIDvzZlYURjV2A/THv9U=">AAACiXichVG7SgNBFD2ur/hM1EawEYNiFWZFNKQKprGMj0TBBNndTHR0X+xOFmLwB6zsRK0ULMQP8ANs/AELP0EsFWwsvNksiAbjXWbnzJl77pyZq7um8CVjz11Kd09vX39sYHBoeGQ0nhgbL/pOzTN4wXBMx9vWNZ+bwuYFKaTJt12Pa5Zu8i39MNfc3wq45wvH3pR1l5ctbc8WVWFokqhiKag40t9NJFmKhTHdDtQIJBFF3knco4QKHBiowQKHDUnYhAafvh2oYHCJK6NBnEdIhPscxxgkbY2yOGVoxB7Sf49WOxFr07pZ0w/VBp1i0vBIOY1Z9sRu2Rt7ZHfshX3+WasR1mh6qdOst7Tc3Y2fTG58/KuyaJbY/1Z19CxRRTr0Ksi7GzLNWxgtfXB09raRWZ9tzLFr9kr+r9gze6Ab2MG7cbPG1y87+NHJC70YNUj93Y52UFxIqUuphbXFZHYlalUMU5jBPPVjGVmsIo8C1T/AKc5xoQwpqpJWMq1UpSvSTOBHKLkvAi+SPA==</latexit>
.
.
.
Latent
variables
Variable
transformation
Feature learning
Classifier or
Regressor
4. Representation learning: Models can have “feature learning”
blocks, and they can be “pre-trained” by different large datasets.
Modern aspects of ML
Prediction
Input
variables
Surface
model
<latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit>
x1
<latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit>
x2
<latexit sha1_base64="lFhRrRrVTrFR31ebbMgRp5myJpc=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDPjCuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmNdjzA2PjE55Z32zczOzfsDC4s5S2+YMs/KuqqbBUm0uKpoPGsrtsoLhsnFuqTyvFTb7+/nm9y0FF07tFsGL9XFqqZUFFm0icqcljfKgRCLMCeCwyDqghDcSOqBRxzhGDpkNFAHhwabsAoRFn1FRMFgEFdCmziTkOLsc5zDR9oGZXHKEImt0b9Kq6LLarTu17QctUynqDRMUgYRZi/snvXYM3tgr+zzz1ptp0bfS4tmaaDlRtl/sZz5+FdVp9nGybdqpGcbFew4XhXybjhM/xbyQN886/Qyu+lwe43dsjfyf8O67IluoDXf5bsUT1+P8CORF3oxalD0dzuGQS4WiW5HYqnNUGLPbZUXK1jFOvUjjgQOkESW6ldxiSt0BK8QEbaE+CBV8LiaJfwIIfEFWE6QnA==</latexit>
x3
<latexit sha1_base64="0IPXcU0UIDvzZlYURjV2A/THv9U=">AAACiXichVG7SgNBFD2ur/hM1EawEYNiFWZFNKQKprGMj0TBBNndTHR0X+xOFmLwB6zsRK0ULMQP8ANs/AELP0EsFWwsvNksiAbjXWbnzJl77pyZq7um8CVjz11Kd09vX39sYHBoeGQ0nhgbL/pOzTN4wXBMx9vWNZ+bwuYFKaTJt12Pa5Zu8i39MNfc3wq45wvH3pR1l5ctbc8WVWFokqhiKag40t9NJFmKhTHdDtQIJBFF3knco4QKHBiowQKHDUnYhAafvh2oYHCJK6NBnEdIhPscxxgkbY2yOGVoxB7Sf49WOxFr07pZ0w/VBp1i0vBIOY1Z9sRu2Rt7ZHfshX3+WasR1mh6qdOst7Tc3Y2fTG58/KuyaJbY/1Z19CxRRTr0Ksi7GzLNWxgtfXB09raRWZ9tzLFr9kr+r9gze6Ab2MG7cbPG1y87+NHJC70YNUj93Y52UFxIqUuphbXFZHYlalUMU5jBPPVjGVmsIo8C1T/AKc5xoQwpqpJWMq1UpSvSTOBHKLkvAi+SPA==</latexit>
.
.
.
Latent
variables
Variable
transformation
Feature learning
Classifier or
Regressor
4. Representation learning: Models can have “feature learning”
blocks, and they can be “pre-trained” by different large datasets.
Modern aspects of ML
Prediction
Input
variables
Surface
model
<latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit>
x1
<latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit>
x2
<latexit sha1_base64="lFhRrRrVTrFR31ebbMgRp5myJpc=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDPjCuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmNdjzA2PjE55Z32zczOzfsDC4s5S2+YMs/KuqqbBUm0uKpoPGsrtsoLhsnFuqTyvFTb7+/nm9y0FF07tFsGL9XFqqZUFFm0icqcljfKgRCLMCeCwyDqghDcSOqBRxzhGDpkNFAHhwabsAoRFn1FRMFgEFdCmziTkOLsc5zDR9oGZXHKEImt0b9Kq6LLarTu17QctUynqDRMUgYRZi/snvXYM3tgr+zzz1ptp0bfS4tmaaDlRtl/sZz5+FdVp9nGybdqpGcbFew4XhXybjhM/xbyQN886/Qyu+lwe43dsjfyf8O67IluoDXf5bsUT1+P8CORF3oxalD0dzuGQS4WiW5HYqnNUGLPbZUXK1jFOvUjjgQOkESW6ldxiSt0BK8QEbaE+CBV8LiaJfwIIfEFWE6QnA==</latexit>
x3
<latexit sha1_base64="0IPXcU0UIDvzZlYURjV2A/THv9U=">AAACiXichVG7SgNBFD2ur/hM1EawEYNiFWZFNKQKprGMj0TBBNndTHR0X+xOFmLwB6zsRK0ULMQP8ANs/AELP0EsFWwsvNksiAbjXWbnzJl77pyZq7um8CVjz11Kd09vX39sYHBoeGQ0nhgbL/pOzTN4wXBMx9vWNZ+bwuYFKaTJt12Pa5Zu8i39MNfc3wq45wvH3pR1l5ctbc8WVWFokqhiKag40t9NJFmKhTHdDtQIJBFF3knco4QKHBiowQKHDUnYhAafvh2oYHCJK6NBnEdIhPscxxgkbY2yOGVoxB7Sf49WOxFr07pZ0w/VBp1i0vBIOY1Z9sRu2Rt7ZHfshX3+WasR1mh6qdOst7Tc3Y2fTG58/KuyaJbY/1Z19CxRRTr0Ksi7GzLNWxgtfXB09raRWZ9tzLFr9kr+r9gze6Ab2MG7cbPG1y87+NHJC70YNUj93Y52UFxIqUuphbXFZHYlalUMU5jBPPVjGVmsIo8C1T/AKc5xoQwpqpJWMq1UpSvSTOBHKLkvAi+SPA==</latexit>
.
.
.
Latent
variables
Variable
transformation
Feature learning
Classifier or
Regressor
Linear
4. Representation learning: Models can have “feature learning”
blocks, and they can be “pre-trained” by different large datasets.
Prior Info
Observational data
Reported facts
Textbook knowledge
Needs and excitement around ML for Chemistry
Discovery
Representation
Model (Belief)
Intervention
Hypothesis
New Info
Prior Info
• Identify relevant variables
• Set design choices
• Set experiments
• Interpret results
Model (Belief)
Hypothesis
Can we somehow externalize “experience and intuition” of
experienced chemists to rationalize and accelerate discoveries?
Prior Info
Observational data
Reported facts
Textbook knowledge
Needs and excitement around ML for Chemistry
Discovery
Representation
Model (Belief)
Intervention
Hypothesis
New Info
Prior Info
• Identify relevant variables
• Set design choices
• Set experiments
• Interpret results
Model (Belief)
Hypothesis
Can we somehow externalize “experience and intuition” of
experienced chemists to rationalize and accelerate discoveries?
Representation
Reactions
Materials
Molecules
ML
computer
programs
• Observational data
• Reported facts
• Textbook knowledge
?
Identifying relevant factors and establishing any necessary and sufficient
computer-readable representations are inevitable preconditions, but this is
far from trivial and quite paradoxical since we haven’t understood the target.
Any rationalized “real” discovery only comes from understanding and discovery
of the causal relations between relevant factors.
Representation
<latexit sha1_base64="dwtAUUE0cfsFu6+2FLg7b109CNE=">AAACi3ichVG7SgNBFL1ZX/ERjdoINsGgWIW7a0iiWIgiWKoxMaASdtdJMmRf7E4CMfgDljYW2ihYiB/gB9j4AxZ+glhGsLHw7mZFLIx3mZ07Z+65c2aO5hjcE4gvEamvf2BwKDo8MjoWG5+IT04VPbvh6qyg24btljTVYwa3WEFwYbCS4zLV1Ay2r9U3/P39JnM9blt7ouWwI1OtWrzCdVUQVDoUNSbUMi/Hk5hazmWUdCaBKcSsrMh+omTTS+mETIgfSQhj244/wCEcgw06NMAEBhYIyg1QwaPvAGRAcAg7gjZhLmU82GdwCiPEbVAVowqV0Dr9q7Q6CFGL1n5PL2DrdIpBwyVmAubxGe+wg094j6/4+WevdtDD19KiWetymVOeOJvJf/zLMmkWUPth9dQsoAK5QCsn7U6A+LfQu/zmyUUnv7I7317AG3wj/df4go90A6v5rt/usN3LHno00kIvRgZ9u5D4OykqKTmTUnbSybX10KoozMIcLJIfWViDLdiGQuDDOVzClRSTlqQVabVbKkVCzjT8CmnzC0ydk0A=</latexit>
✓i
<latexit sha1_base64="tkPRNIYeS8tNgbH62CO/ULi3LDw=">AAACi3ichVHLSsNAFL2Nr/quuhHcBIviqtykoa3iQhTBZbXWFtpSkjjaaF4k04IWf8ClGxe6UXAhfoAf4MYfcOEniMsKblx4k0bEhXrDZO6cuefOmTmaaxo+R3yOCT29ff0D8cGh4ZHRsfHExOSO7zQ9nRV1x3S8sqb6zDRsVuQGN1nZ9ZhqaSYraYdrwX6pxTzfcOxtfuSymqXu28aeoaucoHKVNxhX6wf1RBJTi7mMrGRETCFmJVkKEjmrpBVRIiSIJESRdxL3UIVdcECHJljAwAZOuQkq+PRVQAIEl7AatAnzKDPCfQYnMETcJlUxqlAJPaT/Pq0qEWrTOujph2ydTjFpeMQUYQ6f8BY7+Ih3+IIfv/Zqhz0CLUc0a10uc+vjp9OF939ZFs0cGt+sPzVz2INcqNUg7W6IBLfQu/zW8XmnsLQ1157Ha3wl/Vf4jA90A7v1pt9ssq2LP/RopIVejAz6ckH8PdmRU1ImJW8qyZXVyKo4zMAsLJAfWViBDchDMfThDC7gUhgV0sKSsNwtFWIRZwp+hLD+CU69k0E=</latexit>
✓j
O
N
N
N
H
NH
N
N
N
CH3
CH3
Levels of Theory/Model Abstraction First Principle and Simulation (Quantum Chemistry)
Spatio-Temporal Flexibility, Variations, Dynamics, and Interactions
Representation
Latent
variables
Representation
learning
Reactions
Materials
Molecules
Graphs (of different size)
Node
features
Edge
features
CC1CCNO1
Graph Neural
Networks (GNNs)
NCc1ccoc1.S=(Cl)Cl>>[RX_5]S=C=NCc1ccoc1
…
Classifier or
Regressor
Diverse
Downstream
Tasks
Modular Hierarchy
Amide
Proline
Oxazoline
Compositionality
Phenyl
Carboxyl Methyl Ethyl Tert-butyl
Isoprophyl
Trifluoromethyl
Benzyl
Substituents
Graph

Coarsening
Combinatorial aspects
Representation
NB: Transformers can be considered as a special case of GNNs,
and many Transformer-type GNNs are also developed.
Transformer Core
(Multihead)
Self-attention
Feed-forward NN
Add + LayerNorm
Add + LayerNorm
<latexit sha1_base64="I4mbdBylFC3Uuk1C7RrdvvfeVHQ=">AAACqXichVFNS9xQFD2m9dvqqJtCN8GpogjDy1CqKIXBbrp01NFBI+ElvnEeky+SN0N16B+YP9CFKwUX4qa70m676R9w4U8Qlxa66cKbTEBUqjck97zz7rk57107dGWsGLvs0V687O3rHxgcGh55NTqWG5/YjINm5IiKE7hBVLV5LFzpi4qSyhXVMBLcs12xZTc+JvtbLRHFMvA31EEodj2+78uadLgiysq9DfQPuhk3PUvqJnfDOrfk7Oc5vZakZVPVheJzVi7PCiwN/TEwMpBHFqtB7jtM7CGAgyY8CPhQhF1wxPTswABDSNwu2sRFhGS6L/AFQ6RtUpWgCk5sg777tNrJWJ/WSc84VTv0F5feiJQ6ptkFO2M37Dc7Z1fs3397tdMeiZcDynZXK0JrrPN6/e+zKo+yQv1O9aRnhRoWU6+SvIcpk5zC6epbh19v1pfWptsz7IRdk/9jdsl+0Qn81h/ntCzWjp7wY5MXujEakPFwHI/BZrFgvC8Uy+/ypZVsVAN4gynM0jwWUMInrKJC/Tv4hh/4qc1rZa2qbXdLtZ5MM4l7oTm3XZydSQ==</latexit>
o =
X
i
↵i(x)fi(x; ✓)
Effective pretraining is a crucial open problem because in practice,
we can only access to limited data for each specific problem.
Pretraining with self-supervised pretext tasks have transformed NLP
Prior Info
Observational data
Reported facts
Textbook knowledge
Needs and excitement around ML for Chemistry
Discovery
Representation
Model (Belief)
Intervention
Hypothesis
New Info
Prior Info
• Identify relevant variables
• Set design choices
• Set experiments
• Interpret results
Model (Belief)
Hypothesis
Can we somehow externalize “experience and intuition” of
experienced chemists to rationalize and accelerate discoveries?
New Info
Prior Info
Observational data
Reported facts
Textbook knowledge
Needs and excitement around ML for Chemistry
Discovery
Representation
Model (Belief)
Intervention
Hypothesis
New Info
Prior Info
• Identify relevant variables
• Set design choices
• Set experiments
• Interpret results
Model (Belief)
Hypothesis
Can we somehow externalize “experience and intuition” of
experienced chemists to rationalize and accelerate discoveries?
New Info
(Experimental) Intervention
New Info
Hypothesis
?
Automation
Reactions
Materials
Molecules
Any rationalized “real” discovery only comes from understanding and discovery
of the causal relations between relevant factors.
Information about causal relations can be acquired by passive observation and
active intervention. Correlation does not imply causation.
ML
computer
programs
• Observational data
• Reported facts
• Textbook knowledge
(Experimental) Intervention
We need to carefully rethink how an experiment should be
performed to be informative about causal structure of targets.
(Experimental) Intervention
We need to carefully rethink how an experiment should be
performed to be informative about causal structure of targets.
• Correlation vs Causation
ML models trained over passive observational data can be trapped by
spurious correlations between variables, being totally ignorant of the
underlying causality.
(Experimental) Intervention
We need to carefully rethink how an experiment should be
performed to be informative about causal structure of targets.
• Correlation vs Causation
ML models trained over passive observational data can be trapped by
spurious correlations between variables, being totally ignorant of the
underlying causality.
• Garbage In, Garbage Out (GIGO)
ML models are just representative of the given data. If it has any bias, ML
predictions can be miserably misleading.
(Experimental) Intervention
We need to carefully rethink how an experiment should be
performed to be informative about causal structure of targets.
• Correlation vs Causation
ML models trained over passive observational data can be trapped by
spurious correlations between variables, being totally ignorant of the
underlying causality.
• Garbage In, Garbage Out (GIGO)
ML models are just representative of the given data. If it has any bias, ML
predictions can be miserably misleading.
• Unavoidable Human-Caused Biases
Always remember that “most chemical experiments are planned by human
scientists and therefore are subject to a variety of human cognitive biases,
heuristics and social influences.”
* Jia, X., Lynch, A., Huang, Y. et al. Anthropogenic biases in chemical reaction data hinder
exploratory inorganic synthesis. Nature 573, 251–255 (2019).
https://www.chemistryworld.com/news/dispute-over-reaction-prediction-puts-machine-learnings-
pitfalls-in-spotlight/3009912.article
• Main paper https://doi.org/10.1126/science.aar5169
• Erratum https://doi.org/10.1126/science.aat7648
• Negative comment paper https://doi.org/10.1126/science.aat8603
• Author's response https://doi.org/10.1126/science.aat8763
(Experimental) Intervention
Keys: fusing modern ML with first-principles, simulations, domain
knowledge, and collaboratively working with experimental experts.
Current ML is too data-hungry and vulnerable to any data bias, but
acquisition of clean representative data is often quite impractical.
(Experimental) Intervention
• Deep learning techniques thus far have proven to be data hungry, shallow, brittle, and
limited in their ability to generalize (Marcus, 2018)
• Current machine learning techniques are data-hungry and brittle—they can only make
sense of patterns they've seen before. (Chollet, 2020)
• A growing body of evidence shows that state-of-the-art models learn to exploit spurious
statistical patterns in datasets... instead of learning meaning in the flexible and
generalizable way that humans do. (Nie et al., 2019)
• Current machine learning methods seem weak when they are required to generalize
beyond the training distribution, which is what is often needed in practice. (Bengio et al.,
2019)
(Experimental) Intervention
AlphaGo
(Nature, 2016)
AlphaGo Zero
(Nature, 2017)
AlphaZero
(Science, 2018)
MuZero
(Nature, 2020)
This has reignited the old war between induction and deduction,
and we’re re-encountering the long-standing problems in AI.
• Knowledge acquisition / Principled data acquisition
Experimental design, Model-based optimization, Evolutionary computation
• Reconciliation between inductive and deductive ML
Hybrid models of causal/logical/algorithmic ML and deep learning
• Balancing exploitation and exploration
Model-based reinforcement learning or search in a combinatorial space
ML for Chemistry to me (a ML researcher)
An exciting “real” test bench for the long-standing unsolved but
attractive fundamental problems in “AI for automating discovery”,
involving many fascinating technical topics of modern ML.
Prior Info
Observational data
Reported facts
Textbook knowledge
Discovery
Representation
Model (Belief)
Intervention
Hypothesis
New Info
Prior Info
• Identify relevant variables
• Set design choices
• Set experiments
• Interpret results
Model (Belief)
Hypothesis
Summary
• Why it is needed?
• What are exciting for computer scientists?
Two aspects:
2. (Experimental) Intervention
Machine Learning (ML) for Chemistry
• What are good ML-readable representations for chemistry?
• What information should be recorded and given to ML?
1. Representation
• What are essential to make real chemical discoveries?
• Any principled ways for data acquisition and experimental design?

Machine Learning for Chemistry: Representing and Intervening

  • 1.
    Machine Learning forChemistry: Representing and Intervening Ichigaku Takigawa takigawa@icredd.hokudai.ac.jp Apr 26, 2021 @ Hokkaido University Joint Symposium of Engineering & Information Science & WPI-ICReDD
  • 2.
    I am agraduate of School of Engineering and IST! 1995-2005 (10 years) Hokkaido Univ School of Engineering Grad School of Engineering Grad School of Info Sci & Tech 2012-2019 (7 years) Hokkaido Univ B.Eng (1999) M.Eng (2001), PhD (2004) Postdoc (2004-2005) Grad School of Info Sci & Tech Tenure Track (2012-2014) Assoc Prof (2014-2019) KUDO Mineichi TANAKA Yuzuru SHIMBO Masaru MINATO Shinichi TANAKA Yuzuru IMAI Hideyuki
  • 3.
    2005-2011 (7 years)Kyoto Univ 2019-present (2 years) The “Cross-Appointment System” But when I stepped outside Physically I’m at Kyoto
  • 4.
    Things go interdisciplinary… •Bioinformatics Center Institute for Chemical Research • Grad School of Pharmaceutical Sci • Medical-risk Avoidance based on iPS Cells Team • Institute for Chemical Reaction Design and Discovery Assist Prof (2005-2011) 2005-2011 (7 years) Kyoto Univ 2019-present (2 years) The “Cross-Appointment System”
  • 5.
    This talk • Whyit is needed? • What are exciting for computer scientists? Machine Learning (ML) for Chemistry
  • 6.
    It’s a hottopic in Chemistry
  • 7.
    But also inMachine Learning! NeurIPS 2020 ICML 2020 ICLR 2020 • Self-Supervised Graph Transformer on Large-Scale Molecular Data • RetroXpert: Decompose Retrosynthesis Prediction Like A Chemist • Reinforced Molecular Optimization with Neighborhood-Controlled Grammars • Autofocused Oracles for Model-based Design • Barking Up the Right Tree: an Approach to Search over Molecule Synthesis DAGs • On the Equivalence of Molecular Graph Convolution and Molecular Wave Function with Poor Basis Set • CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models • A Graph to Graphs Framework for Retrosynthesis Prediction • Hierarchical Generation of Molecular Graphs using Structural Motifs • Learning to Navigate in Synthetically Accessible Chemical Space Using Reinforcement Learning • Reinforcement Learning for Molecular Design Guided by Quantum Mechanics • Multi-Objective Molecule Generation using Interpretable Substructures • Improving Molecular Design by Stochastic Iterative Target Augmentation • A Generative Model for Molecular Distance Geometry • Directional Message Passing for Molecular Graphs • GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation • Augmenting Genetic Algorithms with Deep Neural Networks for Exploring the Chemical Space • A Fair Comparison of Graph Neural Networks for Graph Classification
  • 8.
    Mixed feelings ofcuriosity, optimism, skepticism?
  • 9.
    Inseparably linked toautomation “These illustrate how rapid advancements in hardware automation and machine learning continue to transform the nature of experimentation and modeling.” Automation is the use of technology to perform tasks with reduced human involvement or human labor.
  • 10.
    Towards machine autonomyin discovery Organic synthesis in a modular robotic system. Science 363 (2019) A mobile robotic chemist. Nature 583 (2020) Automating drug discovery. Nature Reviews Drug Discovery 17 (2018) Automation has been impactfully changing our daily life, society, as well as scientific experiments and computations.
  • 11.
    This talk • Whyit is needed? • What are exciting for computer scientists? I’ll briefly cover these from two aspects: 2. (Experimental) Intervention Machine Learning (ML) for Chemistry • What are good ML-readable representations for chemistry? • What information should be recorded and given to ML? 1. Representation • What are essential to make real chemical discoveries? • Any principled ways for data acquisition and experimental design?
  • 12.
    Two pillars forscientific discovery? In essence, ML for chemistry is metascience (the science on how to do science) unexpectedly hitting age-old unsolved questions in the philosophy of natural science.
  • 13.
    Machine Learning (ML) https://www.forbes.com/sites/forbestechcouncil/2020/02/19/ in-praise-of-boring-ai-a-k-a-machine-learning/ … “Let’sface it: So far, the artificial intelligence plastered all over PowerPoint slides hasn’t lived up to its hype.” The AI frenzy: hope & hype
  • 14.
    Machine Learning (ML) FromAAAI-20 Oxford-Style Debate https://www.forbes.com/sites/forbestechcouncil/2020/02/19/ in-praise-of-boring-ai-a-k-a-machine-learning/ … “Let’s face it: So far, the artificial intelligence plastered all over PowerPoint slides hasn’t lived up to its hype.” The AI frenzy: hope & hype
  • 15.
    Machine Learning (ML) Allabout statistical and algorithmic techniques for surface-model fitting to data points by adjusting model parameters. Random Forest Neural Networks SVR Kernel Ridge “Predictive Modeling” Fitted surface used for making predictions on unseen data points Variable 1 Variable 2 <latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit> x1 <latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit> x2 <latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit> x1 <latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit> x2
  • 16.
    Modern aspects ofML 1. High dimensionality: Data can have many input variables. a 100x100 pixel grayscale image = 10000 input variables (a 10000-dimensional array)
  • 17.
    Modern aspects ofML 1. High dimensionality: Data can have many input variables. a 100x100 pixel grayscale image = 10000 input variables (a 10000-dimensional array) 2. Multiformity and multimodality: Data take many forms + modes Numerical values, discrete structures, networks, variable-length sequences, etc. Images, volumes, videos, audios, texts, point clouds, geometries, sensor signals, etc.
  • 18.
    Modern aspects ofML 1. High dimensionality: Data can have many input variables. a 100x100 pixel grayscale image = 10000 input variables (a 10000-dimensional array) 3. Overrepresentation: ML models can have many parameters. ResNet50: 26 million params ResNet101: 45 million params EfficientNet-B7: 66 million params VGG19: 144 million params 12-layer, 12-heads BERT: 110 million params 24-layer, 16-heads BERT: 336 million params GPT-2 XL: 1558 million params GPT-3: 175 billion params 2. Multiformity and multimodality: Data take many forms + modes Numerical values, discrete structures, networks, variable-length sequences, etc. Images, volumes, videos, audios, texts, point clouds, geometries, sensor signals, etc.
  • 19.
    Modern aspects ofML 1. High dimensionality: Data can have many input variables. a 100x100 pixel grayscale image = 10000 input variables (a 10000-dimensional array) 3. Overrepresentation: ML models can have many parameters. ResNet50: 26 million params ResNet101: 45 million params EfficientNet-B7: 66 million params VGG19: 144 million params 12-layer, 12-heads BERT: 110 million params 24-layer, 16-heads BERT: 336 million params GPT-2 XL: 1558 million params GPT-3: 175 billion params Can you imagine what would happen if we try to fit a surface model having 175 billion parameters to 100 million data points in 10 thousand dimension?? 2. Multiformity and multimodality: Data take many forms + modes Numerical values, discrete structures, networks, variable-length sequences, etc. Images, volumes, videos, audios, texts, point clouds, geometries, sensor signals, etc.
  • 20.
    Modern aspects ofML 4. Representation learning: Models can have “feature learning” blocks, and they can be “pre-trained” by different large datasets. Prediction Input variables Surface model Classifier or Regressor <latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit> x1 <latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit> x2 <latexit sha1_base64="lFhRrRrVTrFR31ebbMgRp5myJpc=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDPjCuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmNdjzA2PjE55Z32zczOzfsDC4s5S2+YMs/KuqqbBUm0uKpoPGsrtsoLhsnFuqTyvFTb7+/nm9y0FF07tFsGL9XFqqZUFFm0icqcljfKgRCLMCeCwyDqghDcSOqBRxzhGDpkNFAHhwabsAoRFn1FRMFgEFdCmziTkOLsc5zDR9oGZXHKEImt0b9Kq6LLarTu17QctUynqDRMUgYRZi/snvXYM3tgr+zzz1ptp0bfS4tmaaDlRtl/sZz5+FdVp9nGybdqpGcbFew4XhXybjhM/xbyQN886/Qyu+lwe43dsjfyf8O67IluoDXf5bsUT1+P8CORF3oxalD0dzuGQS4WiW5HYqnNUGLPbZUXK1jFOvUjjgQOkESW6ldxiSt0BK8QEbaE+CBV8LiaJfwIIfEFWE6QnA==</latexit> x3 <latexit sha1_base64="0IPXcU0UIDvzZlYURjV2A/THv9U=">AAACiXichVG7SgNBFD2ur/hM1EawEYNiFWZFNKQKprGMj0TBBNndTHR0X+xOFmLwB6zsRK0ULMQP8ANs/AELP0EsFWwsvNksiAbjXWbnzJl77pyZq7um8CVjz11Kd09vX39sYHBoeGQ0nhgbL/pOzTN4wXBMx9vWNZ+bwuYFKaTJt12Pa5Zu8i39MNfc3wq45wvH3pR1l5ctbc8WVWFokqhiKag40t9NJFmKhTHdDtQIJBFF3knco4QKHBiowQKHDUnYhAafvh2oYHCJK6NBnEdIhPscxxgkbY2yOGVoxB7Sf49WOxFr07pZ0w/VBp1i0vBIOY1Z9sRu2Rt7ZHfshX3+WasR1mh6qdOst7Tc3Y2fTG58/KuyaJbY/1Z19CxRRTr0Ksi7GzLNWxgtfXB09raRWZ9tzLFr9kr+r9gze6Ab2MG7cbPG1y87+NHJC70YNUj93Y52UFxIqUuphbXFZHYlalUMU5jBPPVjGVmsIo8C1T/AKc5xoQwpqpJWMq1UpSvSTOBHKLkvAi+SPA==</latexit> . . .
  • 21.
    Modern aspects ofML Prediction Input variables Surface model <latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit> x1 <latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit> x2 <latexit sha1_base64="lFhRrRrVTrFR31ebbMgRp5myJpc=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDPjCuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmNdjzA2PjE55Z32zczOzfsDC4s5S2+YMs/KuqqbBUm0uKpoPGsrtsoLhsnFuqTyvFTb7+/nm9y0FF07tFsGL9XFqqZUFFm0icqcljfKgRCLMCeCwyDqghDcSOqBRxzhGDpkNFAHhwabsAoRFn1FRMFgEFdCmziTkOLsc5zDR9oGZXHKEImt0b9Kq6LLarTu17QctUynqDRMUgYRZi/snvXYM3tgr+zzz1ptp0bfS4tmaaDlRtl/sZz5+FdVp9nGybdqpGcbFew4XhXybjhM/xbyQN886/Qyu+lwe43dsjfyf8O67IluoDXf5bsUT1+P8CORF3oxalD0dzuGQS4WiW5HYqnNUGLPbZUXK1jFOvUjjgQOkESW6ldxiSt0BK8QEbaE+CBV8LiaJfwIIfEFWE6QnA==</latexit> x3 <latexit sha1_base64="0IPXcU0UIDvzZlYURjV2A/THv9U=">AAACiXichVG7SgNBFD2ur/hM1EawEYNiFWZFNKQKprGMj0TBBNndTHR0X+xOFmLwB6zsRK0ULMQP8ANs/AELP0EsFWwsvNksiAbjXWbnzJl77pyZq7um8CVjz11Kd09vX39sYHBoeGQ0nhgbL/pOzTN4wXBMx9vWNZ+bwuYFKaTJt12Pa5Zu8i39MNfc3wq45wvH3pR1l5ctbc8WVWFokqhiKag40t9NJFmKhTHdDtQIJBFF3knco4QKHBiowQKHDUnYhAafvh2oYHCJK6NBnEdIhPscxxgkbY2yOGVoxB7Sf49WOxFr07pZ0w/VBp1i0vBIOY1Z9sRu2Rt7ZHfshX3+WasR1mh6qdOst7Tc3Y2fTG58/KuyaJbY/1Z19CxRRTr0Ksi7GzLNWxgtfXB09raRWZ9tzLFr9kr+r9gze6Ab2MG7cbPG1y87+NHJC70YNUj93Y52UFxIqUuphbXFZHYlalUMU5jBPPVjGVmsIo8C1T/AKc5xoQwpqpJWMq1UpSvSTOBHKLkvAi+SPA==</latexit> . . . Latent variables Variable transformation Feature learning Classifier or Regressor 4. Representation learning: Models can have “feature learning” blocks, and they can be “pre-trained” by different large datasets.
  • 22.
    Modern aspects ofML Prediction Input variables Surface model <latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit> x1 <latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit> x2 <latexit sha1_base64="lFhRrRrVTrFR31ebbMgRp5myJpc=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDPjCuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmNdjzA2PjE55Z32zczOzfsDC4s5S2+YMs/KuqqbBUm0uKpoPGsrtsoLhsnFuqTyvFTb7+/nm9y0FF07tFsGL9XFqqZUFFm0icqcljfKgRCLMCeCwyDqghDcSOqBRxzhGDpkNFAHhwabsAoRFn1FRMFgEFdCmziTkOLsc5zDR9oGZXHKEImt0b9Kq6LLarTu17QctUynqDRMUgYRZi/snvXYM3tgr+zzz1ptp0bfS4tmaaDlRtl/sZz5+FdVp9nGybdqpGcbFew4XhXybjhM/xbyQN886/Qyu+lwe43dsjfyf8O67IluoDXf5bsUT1+P8CORF3oxalD0dzuGQS4WiW5HYqnNUGLPbZUXK1jFOvUjjgQOkESW6ldxiSt0BK8QEbaE+CBV8LiaJfwIIfEFWE6QnA==</latexit> x3 <latexit sha1_base64="0IPXcU0UIDvzZlYURjV2A/THv9U=">AAACiXichVG7SgNBFD2ur/hM1EawEYNiFWZFNKQKprGMj0TBBNndTHR0X+xOFmLwB6zsRK0ULMQP8ANs/AELP0EsFWwsvNksiAbjXWbnzJl77pyZq7um8CVjz11Kd09vX39sYHBoeGQ0nhgbL/pOzTN4wXBMx9vWNZ+bwuYFKaTJt12Pa5Zu8i39MNfc3wq45wvH3pR1l5ctbc8WVWFokqhiKag40t9NJFmKhTHdDtQIJBFF3knco4QKHBiowQKHDUnYhAafvh2oYHCJK6NBnEdIhPscxxgkbY2yOGVoxB7Sf49WOxFr07pZ0w/VBp1i0vBIOY1Z9sRu2Rt7ZHfshX3+WasR1mh6qdOst7Tc3Y2fTG58/KuyaJbY/1Z19CxRRTr0Ksi7GzLNWxgtfXB09raRWZ9tzLFr9kr+r9gze6Ab2MG7cbPG1y87+NHJC70YNUj93Y52UFxIqUuphbXFZHYlalUMU5jBPPVjGVmsIo8C1T/AKc5xoQwpqpJWMq1UpSvSTOBHKLkvAi+SPA==</latexit> . . . Latent variables Variable transformation Feature learning Classifier or Regressor 4. Representation learning: Models can have “feature learning” blocks, and they can be “pre-trained” by different large datasets.
  • 23.
    Modern aspects ofML Prediction Input variables Surface model <latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit> x1 <latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit> x2 <latexit sha1_base64="lFhRrRrVTrFR31ebbMgRp5myJpc=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDPjCuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmNdjzA2PjE55Z32zczOzfsDC4s5S2+YMs/KuqqbBUm0uKpoPGsrtsoLhsnFuqTyvFTb7+/nm9y0FF07tFsGL9XFqqZUFFm0icqcljfKgRCLMCeCwyDqghDcSOqBRxzhGDpkNFAHhwabsAoRFn1FRMFgEFdCmziTkOLsc5zDR9oGZXHKEImt0b9Kq6LLarTu17QctUynqDRMUgYRZi/snvXYM3tgr+zzz1ptp0bfS4tmaaDlRtl/sZz5+FdVp9nGybdqpGcbFew4XhXybjhM/xbyQN886/Qyu+lwe43dsjfyf8O67IluoDXf5bsUT1+P8CORF3oxalD0dzuGQS4WiW5HYqnNUGLPbZUXK1jFOvUjjgQOkESW6ldxiSt0BK8QEbaE+CBV8LiaJfwIIfEFWE6QnA==</latexit> x3 <latexit sha1_base64="0IPXcU0UIDvzZlYURjV2A/THv9U=">AAACiXichVG7SgNBFD2ur/hM1EawEYNiFWZFNKQKprGMj0TBBNndTHR0X+xOFmLwB6zsRK0ULMQP8ANs/AELP0EsFWwsvNksiAbjXWbnzJl77pyZq7um8CVjz11Kd09vX39sYHBoeGQ0nhgbL/pOzTN4wXBMx9vWNZ+bwuYFKaTJt12Pa5Zu8i39MNfc3wq45wvH3pR1l5ctbc8WVWFokqhiKag40t9NJFmKhTHdDtQIJBFF3knco4QKHBiowQKHDUnYhAafvh2oYHCJK6NBnEdIhPscxxgkbY2yOGVoxB7Sf49WOxFr07pZ0w/VBp1i0vBIOY1Z9sRu2Rt7ZHfshX3+WasR1mh6qdOst7Tc3Y2fTG58/KuyaJbY/1Z19CxRRTr0Ksi7GzLNWxgtfXB09raRWZ9tzLFr9kr+r9gze6Ab2MG7cbPG1y87+NHJC70YNUj93Y52UFxIqUuphbXFZHYlalUMU5jBPPVjGVmsIo8C1T/AKc5xoQwpqpJWMq1UpSvSTOBHKLkvAi+SPA==</latexit> . . . Latent variables Variable transformation Feature learning Classifier or Regressor 4. Representation learning: Models can have “feature learning” blocks, and they can be “pre-trained” by different large datasets.
  • 24.
    Modern aspects ofML Prediction Input variables Surface model <latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit> x1 <latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit> x2 <latexit sha1_base64="lFhRrRrVTrFR31ebbMgRp5myJpc=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDPjCuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmNdjzA2PjE55Z32zczOzfsDC4s5S2+YMs/KuqqbBUm0uKpoPGsrtsoLhsnFuqTyvFTb7+/nm9y0FF07tFsGL9XFqqZUFFm0icqcljfKgRCLMCeCwyDqghDcSOqBRxzhGDpkNFAHhwabsAoRFn1FRMFgEFdCmziTkOLsc5zDR9oGZXHKEImt0b9Kq6LLarTu17QctUynqDRMUgYRZi/snvXYM3tgr+zzz1ptp0bfS4tmaaDlRtl/sZz5+FdVp9nGybdqpGcbFew4XhXybjhM/xbyQN886/Qyu+lwe43dsjfyf8O67IluoDXf5bsUT1+P8CORF3oxalD0dzuGQS4WiW5HYqnNUGLPbZUXK1jFOvUjjgQOkESW6ldxiSt0BK8QEbaE+CBV8LiaJfwIIfEFWE6QnA==</latexit> x3 <latexit sha1_base64="0IPXcU0UIDvzZlYURjV2A/THv9U=">AAACiXichVG7SgNBFD2ur/hM1EawEYNiFWZFNKQKprGMj0TBBNndTHR0X+xOFmLwB6zsRK0ULMQP8ANs/AELP0EsFWwsvNksiAbjXWbnzJl77pyZq7um8CVjz11Kd09vX39sYHBoeGQ0nhgbL/pOzTN4wXBMx9vWNZ+bwuYFKaTJt12Pa5Zu8i39MNfc3wq45wvH3pR1l5ctbc8WVWFokqhiKag40t9NJFmKhTHdDtQIJBFF3knco4QKHBiowQKHDUnYhAafvh2oYHCJK6NBnEdIhPscxxgkbY2yOGVoxB7Sf49WOxFr07pZ0w/VBp1i0vBIOY1Z9sRu2Rt7ZHfshX3+WasR1mh6qdOst7Tc3Y2fTG58/KuyaJbY/1Z19CxRRTr0Ksi7GzLNWxgtfXB09raRWZ9tzLFr9kr+r9gze6Ab2MG7cbPG1y87+NHJC70YNUj93Y52UFxIqUuphbXFZHYlalUMU5jBPPVjGVmsIo8C1T/AKc5xoQwpqpJWMq1UpSvSTOBHKLkvAi+SPA==</latexit> . . . Latent variables Variable transformation Feature learning Classifier or Regressor 4. Representation learning: Models can have “feature learning” blocks, and they can be “pre-trained” by different large datasets.
  • 25.
    Modern aspects ofML Prediction Input variables Surface model <latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit> x1 <latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit> x2 <latexit sha1_base64="lFhRrRrVTrFR31ebbMgRp5myJpc=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDPjCuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmNdjzA2PjE55Z32zczOzfsDC4s5S2+YMs/KuqqbBUm0uKpoPGsrtsoLhsnFuqTyvFTb7+/nm9y0FF07tFsGL9XFqqZUFFm0icqcljfKgRCLMCeCwyDqghDcSOqBRxzhGDpkNFAHhwabsAoRFn1FRMFgEFdCmziTkOLsc5zDR9oGZXHKEImt0b9Kq6LLarTu17QctUynqDRMUgYRZi/snvXYM3tgr+zzz1ptp0bfS4tmaaDlRtl/sZz5+FdVp9nGybdqpGcbFew4XhXybjhM/xbyQN886/Qyu+lwe43dsjfyf8O67IluoDXf5bsUT1+P8CORF3oxalD0dzuGQS4WiW5HYqnNUGLPbZUXK1jFOvUjjgQOkESW6ldxiSt0BK8QEbaE+CBV8LiaJfwIIfEFWE6QnA==</latexit> x3 <latexit sha1_base64="0IPXcU0UIDvzZlYURjV2A/THv9U=">AAACiXichVG7SgNBFD2ur/hM1EawEYNiFWZFNKQKprGMj0TBBNndTHR0X+xOFmLwB6zsRK0ULMQP8ANs/AELP0EsFWwsvNksiAbjXWbnzJl77pyZq7um8CVjz11Kd09vX39sYHBoeGQ0nhgbL/pOzTN4wXBMx9vWNZ+bwuYFKaTJt12Pa5Zu8i39MNfc3wq45wvH3pR1l5ctbc8WVWFokqhiKag40t9NJFmKhTHdDtQIJBFF3knco4QKHBiowQKHDUnYhAafvh2oYHCJK6NBnEdIhPscxxgkbY2yOGVoxB7Sf49WOxFr07pZ0w/VBp1i0vBIOY1Z9sRu2Rt7ZHfshX3+WasR1mh6qdOst7Tc3Y2fTG58/KuyaJbY/1Z19CxRRTr0Ksi7GzLNWxgtfXB09raRWZ9tzLFr9kr+r9gze6Ab2MG7cbPG1y87+NHJC70YNUj93Y52UFxIqUuphbXFZHYlalUMU5jBPPVjGVmsIo8C1T/AKc5xoQwpqpJWMq1UpSvSTOBHKLkvAi+SPA==</latexit> . . . Latent variables Variable transformation Feature learning Classifier or Regressor 4. Representation learning: Models can have “feature learning” blocks, and they can be “pre-trained” by different large datasets.
  • 26.
    Modern aspects ofML Prediction Input variables Surface model <latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit> x1 <latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit> x2 <latexit sha1_base64="lFhRrRrVTrFR31ebbMgRp5myJpc=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDPjCuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmNdjzA2PjE55Z32zczOzfsDC4s5S2+YMs/KuqqbBUm0uKpoPGsrtsoLhsnFuqTyvFTb7+/nm9y0FF07tFsGL9XFqqZUFFm0icqcljfKgRCLMCeCwyDqghDcSOqBRxzhGDpkNFAHhwabsAoRFn1FRMFgEFdCmziTkOLsc5zDR9oGZXHKEImt0b9Kq6LLarTu17QctUynqDRMUgYRZi/snvXYM3tgr+zzz1ptp0bfS4tmaaDlRtl/sZz5+FdVp9nGybdqpGcbFew4XhXybjhM/xbyQN886/Qyu+lwe43dsjfyf8O67IluoDXf5bsUT1+P8CORF3oxalD0dzuGQS4WiW5HYqnNUGLPbZUXK1jFOvUjjgQOkESW6ldxiSt0BK8QEbaE+CBV8LiaJfwIIfEFWE6QnA==</latexit> x3 <latexit sha1_base64="0IPXcU0UIDvzZlYURjV2A/THv9U=">AAACiXichVG7SgNBFD2ur/hM1EawEYNiFWZFNKQKprGMj0TBBNndTHR0X+xOFmLwB6zsRK0ULMQP8ANs/AELP0EsFWwsvNksiAbjXWbnzJl77pyZq7um8CVjz11Kd09vX39sYHBoeGQ0nhgbL/pOzTN4wXBMx9vWNZ+bwuYFKaTJt12Pa5Zu8i39MNfc3wq45wvH3pR1l5ctbc8WVWFokqhiKag40t9NJFmKhTHdDtQIJBFF3knco4QKHBiowQKHDUnYhAafvh2oYHCJK6NBnEdIhPscxxgkbY2yOGVoxB7Sf49WOxFr07pZ0w/VBp1i0vBIOY1Z9sRu2Rt7ZHfshX3+WasR1mh6qdOst7Tc3Y2fTG58/KuyaJbY/1Z19CxRRTr0Ksi7GzLNWxgtfXB09raRWZ9tzLFr9kr+r9gze6Ab2MG7cbPG1y87+NHJC70YNUj93Y52UFxIqUuphbXFZHYlalUMU5jBPPVjGVmsIo8C1T/AKc5xoQwpqpJWMq1UpSvSTOBHKLkvAi+SPA==</latexit> . . . Latent variables Variable transformation Feature learning Classifier or Regressor 4. Representation learning: Models can have “feature learning” blocks, and they can be “pre-trained” by different large datasets.
  • 27.
    Modern aspects ofML Prediction Input variables Surface model <latexit sha1_base64="Ill3Als4zZd947f5Xm9sW99d0QA=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCJTomJcEd245CGPBAlp64gNpW3aQkTiD5i4lYUrTVwYP8APcOMPuOATjEtM3LjwUpoYJeJtpnPmzD13zsyVTU21Hca6PmFsfGJyyj8dmJmdmw+GFhbzttGwFJ5TDM2wirJkc03Vec5RHY0XTYtLdVnjBbm2198vNLllq4Z+4LRMXq5LVV09VhXJISp7WhEroQiLMTfCw0D0QARepIzQIw5xBAMKGqiDQ4dDWIMEm74SRDCYxJXRJs4ipLr7HOcIkLZBWZwyJGJr9K/SquSxOq37NW1XrdApGg2LlGFE2Qu7Zz32zB7YK/v8s1bbrdH30qJZHmi5WQleLGc//lXVaXZw8q0a6dnBMbZdryp5N12mfwtloG+edXrZnUy0vcZu2Rv5v2Fd9kQ30Jvvyl2aZ65H+JHJC70YNUj83Y5hkI/HxK1YPL0RSe56rfJjBatYp34kkMQ+UshR/SoucYWO4BdiwqaQGKQKPk+zhB8hJL8AVA6Qmg==</latexit> x1 <latexit sha1_base64="QFtMwnKe2I12XGZu0bNJbdnDaaE=">AAAChnichVG7TgJBFD2sL8QHqI2JDZFgrMhAVIwV0caShzwSJGR3HXHDvrK7EJH4Aya2UlhpYmH8AD/Axh+w4BOMJSY2Fl6WTYwS8W5m58yZe+6cmSuZqmI7jHV9wtj4xOSUfzowMzs3HwwtLBZso2HJPC8bqmGVJNHmqqLzvKM4Ki+ZFhc1SeVFqb7X3y82uWUrhn7gtExe0cSarhwrsugQlTutJqqhCIsxN8LDIO6BCLxIG6FHHOIIBmQ0oIFDh0NYhQibvjLiYDCJq6BNnEVIcfc5zhEgbYOyOGWIxNbpX6NV2WN1Wvdr2q5aplNUGhYpw4iyF3bPeuyZPbBX9vlnrbZbo++lRbM00HKzGrxYzn38q9JodnDyrRrp2cExtl2vCnk3XaZ/C3mgb551ermdbLS9xm7ZG/m/YV32RDfQm+/yXYZnr0f4kcgLvRg1KP67HcOgkIjFt2KJzEYkteu1yo8VrGKd+pFECvtII0/1a7jEFTqCX4gJm0JykCr4PM0SfoSQ+gJWLpCb</latexit> x2 <latexit sha1_base64="lFhRrRrVTrFR31ebbMgRp5myJpc=">AAAChnichVHLTsJAFD3UF+ID1I2JGyLBuCIDPjCuiG5c8pBHgoS0dcCG0jZtISLxB0zcysKVJi6MH+AHuPEHXPAJxiUmblx4KU2MEvE20zlz5p47Z+ZKhqpYNmNdjzA2PjE55Z32zczOzfsDC4s5S2+YMs/KuqqbBUm0uKpoPGsrtsoLhsnFuqTyvFTb7+/nm9y0FF07tFsGL9XFqqZUFFm0icqcljfKgRCLMCeCwyDqghDcSOqBRxzhGDpkNFAHhwabsAoRFn1FRMFgEFdCmziTkOLsc5zDR9oGZXHKEImt0b9Kq6LLarTu17QctUynqDRMUgYRZi/snvXYM3tgr+zzz1ptp0bfS4tmaaDlRtl/sZz5+FdVp9nGybdqpGcbFew4XhXybjhM/xbyQN886/Qyu+lwe43dsjfyf8O67IluoDXf5bsUT1+P8CORF3oxalD0dzuGQS4WiW5HYqnNUGLPbZUXK1jFOvUjjgQOkESW6ldxiSt0BK8QEbaE+CBV8LiaJfwIIfEFWE6QnA==</latexit> x3 <latexit sha1_base64="0IPXcU0UIDvzZlYURjV2A/THv9U=">AAACiXichVG7SgNBFD2ur/hM1EawEYNiFWZFNKQKprGMj0TBBNndTHR0X+xOFmLwB6zsRK0ULMQP8ANs/AELP0EsFWwsvNksiAbjXWbnzJl77pyZq7um8CVjz11Kd09vX39sYHBoeGQ0nhgbL/pOzTN4wXBMx9vWNZ+bwuYFKaTJt12Pa5Zu8i39MNfc3wq45wvH3pR1l5ctbc8WVWFokqhiKag40t9NJFmKhTHdDtQIJBFF3knco4QKHBiowQKHDUnYhAafvh2oYHCJK6NBnEdIhPscxxgkbY2yOGVoxB7Sf49WOxFr07pZ0w/VBp1i0vBIOY1Z9sRu2Rt7ZHfshX3+WasR1mh6qdOst7Tc3Y2fTG58/KuyaJbY/1Z19CxRRTr0Ksi7GzLNWxgtfXB09raRWZ9tzLFr9kr+r9gze6Ab2MG7cbPG1y87+NHJC70YNUj93Y52UFxIqUuphbXFZHYlalUMU5jBPPVjGVmsIo8C1T/AKc5xoQwpqpJWMq1UpSvSTOBHKLkvAi+SPA==</latexit> . . . Latent variables Variable transformation Feature learning Classifier or Regressor Linear 4. Representation learning: Models can have “feature learning” blocks, and they can be “pre-trained” by different large datasets.
  • 28.
    Prior Info Observational data Reportedfacts Textbook knowledge Needs and excitement around ML for Chemistry Discovery Representation Model (Belief) Intervention Hypothesis New Info Prior Info • Identify relevant variables • Set design choices • Set experiments • Interpret results Model (Belief) Hypothesis Can we somehow externalize “experience and intuition” of experienced chemists to rationalize and accelerate discoveries?
  • 29.
    Prior Info Observational data Reportedfacts Textbook knowledge Needs and excitement around ML for Chemistry Discovery Representation Model (Belief) Intervention Hypothesis New Info Prior Info • Identify relevant variables • Set design choices • Set experiments • Interpret results Model (Belief) Hypothesis Can we somehow externalize “experience and intuition” of experienced chemists to rationalize and accelerate discoveries?
  • 30.
    Representation Reactions Materials Molecules ML computer programs • Observational data •Reported facts • Textbook knowledge ? Identifying relevant factors and establishing any necessary and sufficient computer-readable representations are inevitable preconditions, but this is far from trivial and quite paradoxical since we haven’t understood the target. Any rationalized “real” discovery only comes from understanding and discovery of the causal relations between relevant factors.
  • 31.
    Representation <latexit sha1_base64="dwtAUUE0cfsFu6+2FLg7b109CNE=">AAACi3ichVG7SgNBFL1ZX/ERjdoINsGgWIW7a0iiWIgiWKoxMaASdtdJMmRf7E4CMfgDljYW2ihYiB/gB9j4AxZ+glhGsLHw7mZFLIx3mZ07Z+65c2aO5hjcE4gvEamvf2BwKDo8MjoWG5+IT04VPbvh6qyg24btljTVYwa3WEFwYbCS4zLV1Ay2r9U3/P39JnM9blt7ouWwI1OtWrzCdVUQVDoUNSbUMi/Hk5hazmWUdCaBKcSsrMh+omTTS+mETIgfSQhj244/wCEcgw06NMAEBhYIyg1QwaPvAGRAcAg7gjZhLmU82GdwCiPEbVAVowqV0Dr9q7Q6CFGL1n5PL2DrdIpBwyVmAubxGe+wg094j6/4+WevdtDD19KiWetymVOeOJvJf/zLMmkWUPth9dQsoAK5QCsn7U6A+LfQu/zmyUUnv7I7317AG3wj/df4go90A6v5rt/usN3LHno00kIvRgZ9u5D4OykqKTmTUnbSybX10KoozMIcLJIfWViDLdiGQuDDOVzClRSTlqQVabVbKkVCzjT8CmnzC0ydk0A=</latexit> ✓i <latexit sha1_base64="tkPRNIYeS8tNgbH62CO/ULi3LDw=">AAACi3ichVHLSsNAFL2Nr/quuhHcBIviqtykoa3iQhTBZbXWFtpSkjjaaF4k04IWf8ClGxe6UXAhfoAf4MYfcOEniMsKblx4k0bEhXrDZO6cuefOmTmaaxo+R3yOCT29ff0D8cGh4ZHRsfHExOSO7zQ9nRV1x3S8sqb6zDRsVuQGN1nZ9ZhqaSYraYdrwX6pxTzfcOxtfuSymqXu28aeoaucoHKVNxhX6wf1RBJTi7mMrGRETCFmJVkKEjmrpBVRIiSIJESRdxL3UIVdcECHJljAwAZOuQkq+PRVQAIEl7AatAnzKDPCfQYnMETcJlUxqlAJPaT/Pq0qEWrTOujph2ydTjFpeMQUYQ6f8BY7+Ih3+IIfv/Zqhz0CLUc0a10uc+vjp9OF939ZFs0cGt+sPzVz2INcqNUg7W6IBLfQu/zW8XmnsLQ1157Ha3wl/Vf4jA90A7v1pt9ssq2LP/RopIVejAz6ckH8PdmRU1ImJW8qyZXVyKo4zMAsLJAfWViBDchDMfThDC7gUhgV0sKSsNwtFWIRZwp+hLD+CU69k0E=</latexit> ✓j O N N N H NH N N N CH3 CH3 Levelsof Theory/Model Abstraction First Principle and Simulation (Quantum Chemistry) Spatio-Temporal Flexibility, Variations, Dynamics, and Interactions
  • 32.
    Representation Latent variables Representation learning Reactions Materials Molecules Graphs (of differentsize) Node features Edge features CC1CCNO1 Graph Neural Networks (GNNs) NCc1ccoc1.S=(Cl)Cl>>[RX_5]S=C=NCc1ccoc1 … Classifier or Regressor Diverse Downstream Tasks Modular Hierarchy Amide Proline Oxazoline Compositionality Phenyl Carboxyl Methyl Ethyl Tert-butyl Isoprophyl Trifluoromethyl Benzyl Substituents Graph
 Coarsening Combinatorial aspects
  • 33.
    Representation NB: Transformers canbe considered as a special case of GNNs, and many Transformer-type GNNs are also developed. Transformer Core (Multihead) Self-attention Feed-forward NN Add + LayerNorm Add + LayerNorm <latexit sha1_base64="I4mbdBylFC3Uuk1C7RrdvvfeVHQ=">AAACqXichVFNS9xQFD2m9dvqqJtCN8GpogjDy1CqKIXBbrp01NFBI+ElvnEeky+SN0N16B+YP9CFKwUX4qa70m676R9w4U8Qlxa66cKbTEBUqjck97zz7rk57107dGWsGLvs0V687O3rHxgcGh55NTqWG5/YjINm5IiKE7hBVLV5LFzpi4qSyhXVMBLcs12xZTc+JvtbLRHFMvA31EEodj2+78uadLgiysq9DfQPuhk3PUvqJnfDOrfk7Oc5vZakZVPVheJzVi7PCiwN/TEwMpBHFqtB7jtM7CGAgyY8CPhQhF1wxPTswABDSNwu2sRFhGS6L/AFQ6RtUpWgCk5sg777tNrJWJ/WSc84VTv0F5feiJQ6ptkFO2M37Dc7Z1fs3397tdMeiZcDynZXK0JrrPN6/e+zKo+yQv1O9aRnhRoWU6+SvIcpk5zC6epbh19v1pfWptsz7IRdk/9jdsl+0Qn81h/ntCzWjp7wY5MXujEakPFwHI/BZrFgvC8Uy+/ypZVsVAN4gynM0jwWUMInrKJC/Tv4hh/4qc1rZa2qbXdLtZ5MM4l7oTm3XZydSQ==</latexit> o = X i ↵i(x)fi(x; ✓) Effective pretraining is a crucial open problem because in practice, we can only access to limited data for each specific problem. Pretraining with self-supervised pretext tasks have transformed NLP
  • 34.
    Prior Info Observational data Reportedfacts Textbook knowledge Needs and excitement around ML for Chemistry Discovery Representation Model (Belief) Intervention Hypothesis New Info Prior Info • Identify relevant variables • Set design choices • Set experiments • Interpret results Model (Belief) Hypothesis Can we somehow externalize “experience and intuition” of experienced chemists to rationalize and accelerate discoveries? New Info
  • 35.
    Prior Info Observational data Reportedfacts Textbook knowledge Needs and excitement around ML for Chemistry Discovery Representation Model (Belief) Intervention Hypothesis New Info Prior Info • Identify relevant variables • Set design choices • Set experiments • Interpret results Model (Belief) Hypothesis Can we somehow externalize “experience and intuition” of experienced chemists to rationalize and accelerate discoveries? New Info
  • 36.
    (Experimental) Intervention New Info Hypothesis ? Automation Reactions Materials Molecules Anyrationalized “real” discovery only comes from understanding and discovery of the causal relations between relevant factors. Information about causal relations can be acquired by passive observation and active intervention. Correlation does not imply causation. ML computer programs • Observational data • Reported facts • Textbook knowledge
  • 37.
    (Experimental) Intervention We needto carefully rethink how an experiment should be performed to be informative about causal structure of targets.
  • 38.
    (Experimental) Intervention We needto carefully rethink how an experiment should be performed to be informative about causal structure of targets. • Correlation vs Causation ML models trained over passive observational data can be trapped by spurious correlations between variables, being totally ignorant of the underlying causality.
  • 39.
    (Experimental) Intervention We needto carefully rethink how an experiment should be performed to be informative about causal structure of targets. • Correlation vs Causation ML models trained over passive observational data can be trapped by spurious correlations between variables, being totally ignorant of the underlying causality. • Garbage In, Garbage Out (GIGO) ML models are just representative of the given data. If it has any bias, ML predictions can be miserably misleading.
  • 40.
    (Experimental) Intervention We needto carefully rethink how an experiment should be performed to be informative about causal structure of targets. • Correlation vs Causation ML models trained over passive observational data can be trapped by spurious correlations between variables, being totally ignorant of the underlying causality. • Garbage In, Garbage Out (GIGO) ML models are just representative of the given data. If it has any bias, ML predictions can be miserably misleading. • Unavoidable Human-Caused Biases Always remember that “most chemical experiments are planned by human scientists and therefore are subject to a variety of human cognitive biases, heuristics and social influences.” * Jia, X., Lynch, A., Huang, Y. et al. Anthropogenic biases in chemical reaction data hinder exploratory inorganic synthesis. Nature 573, 251–255 (2019).
  • 41.
    https://www.chemistryworld.com/news/dispute-over-reaction-prediction-puts-machine-learnings- pitfalls-in-spotlight/3009912.article • Main paperhttps://doi.org/10.1126/science.aar5169 • Erratum https://doi.org/10.1126/science.aat7648 • Negative comment paper https://doi.org/10.1126/science.aat8603 • Author's response https://doi.org/10.1126/science.aat8763 (Experimental) Intervention
  • 42.
    Keys: fusing modernML with first-principles, simulations, domain knowledge, and collaboratively working with experimental experts. Current ML is too data-hungry and vulnerable to any data bias, but acquisition of clean representative data is often quite impractical. (Experimental) Intervention • Deep learning techniques thus far have proven to be data hungry, shallow, brittle, and limited in their ability to generalize (Marcus, 2018) • Current machine learning techniques are data-hungry and brittle—they can only make sense of patterns they've seen before. (Chollet, 2020) • A growing body of evidence shows that state-of-the-art models learn to exploit spurious statistical patterns in datasets... instead of learning meaning in the flexible and generalizable way that humans do. (Nie et al., 2019) • Current machine learning methods seem weak when they are required to generalize beyond the training distribution, which is what is often needed in practice. (Bengio et al., 2019)
  • 43.
    (Experimental) Intervention AlphaGo (Nature, 2016) AlphaGoZero (Nature, 2017) AlphaZero (Science, 2018) MuZero (Nature, 2020) This has reignited the old war between induction and deduction, and we’re re-encountering the long-standing problems in AI. • Knowledge acquisition / Principled data acquisition Experimental design, Model-based optimization, Evolutionary computation • Reconciliation between inductive and deductive ML Hybrid models of causal/logical/algorithmic ML and deep learning • Balancing exploitation and exploration Model-based reinforcement learning or search in a combinatorial space
  • 44.
    ML for Chemistryto me (a ML researcher) An exciting “real” test bench for the long-standing unsolved but attractive fundamental problems in “AI for automating discovery”, involving many fascinating technical topics of modern ML. Prior Info Observational data Reported facts Textbook knowledge Discovery Representation Model (Belief) Intervention Hypothesis New Info Prior Info • Identify relevant variables • Set design choices • Set experiments • Interpret results Model (Belief) Hypothesis
  • 45.
    Summary • Why itis needed? • What are exciting for computer scientists? Two aspects: 2. (Experimental) Intervention Machine Learning (ML) for Chemistry • What are good ML-readable representations for chemistry? • What information should be recorded and given to ML? 1. Representation • What are essential to make real chemical discoveries? • Any principled ways for data acquisition and experimental design?