SlideShare a Scribd company logo
Deep Implicit Layers
Learning Structured Problems with Neural Networks
2021.09.06.
KAIST ALIN-LAB
Sangwoo Mo
1
Can deep learning solve structured problems?
2
• Deep learning has shown remarkable success on perception (system 1) tasks
= “3”
Can deep learning solve structured problems?
3
• Deep learning has shown remarkable success on perception (system 1) tasks
• Can deep learning also solve the complex reasoning (system 2) problems?
= “3”
Solve Sudoku
Can deep learning solve structured problems?
4
• Structured reasoning problems require an algorithmic thinking
Can deep learning solve structured problems?
5
• Deep implicit layers: design a layer to follow the algorithmic rule
• Output of the layer is a solution of an algorithm, not a simple calculus 𝜎(𝑊𝑧 + 𝑏)
• Forward: 𝑧∗ = Algorithm"(𝑧)
• Backward: both
#$∗
#$
and
#$∗
#"
should be computable
Can deep learning solve structured problems?
6
• Deep implicit layers: design a layer to follow the algorithmic rule
• Output of the layer is a solution of an algorithm, not a simple calculus 𝜎(𝑊𝑧 + 𝑏)
• Forward: 𝑧∗ = Algorithm"(𝑧)
• Backward: both
#$∗
#$
and
#$∗
#"
should be computable
• Why we need the implicit layers?
• Reliable and generalizable prediction from interpretable rules[1,2]
• It is importance to choose a proper architecture following the problem’s structure
[1] Chen et al. Understanding Deep Architectures with Reasoning Layer. NeurIPS 2020.
[2] Xu et al. How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks. ICLR 2021.
Can deep learning solve structured problems?
7
• Deep implicit layers: design a layer to follow the algorithmic rule
• Output of the layer is a solution of an algorithm, not a simple calculus 𝜎(𝑊𝑧 + 𝑏)
• Forward: 𝑧∗ = Algorithm"(𝑧)
• Backward: both
#$∗
#$
and
#$∗
#"
should be computable
• Examples of implicit layers
• (Convex) optimization (application: meta-learning, structured prediction)
• Discrete optimization (application: abstract reasoning)
• Differential equation (application: sequential modeling, density estimation)
• Fixed-point iteration (application: memory-efficient architectures)
• Planning & control (application: model-based RL)
• …and so on (e.g., ranking & sorting)
Implicit layer – (Convex) optimization
8
• How it works?
• Forward: (convex) optimization solvers
• Backward: use property of optima (e.g., KKT condition)
• Technical detail: OptNet considers a quadratic program
then the gradients over parameters 𝑄, 𝑞, 𝐴, 𝑏, 𝐺, ℎ are given by
[1] Amos & Kolter. OptNet: Differentiable Optimization as a Layer in Neural Networks. ICML 2017.
[2] Agrawal et al. Differentiable Convex Optimization Layers. NeurIPS 2019.
Implicit layer – (Convex) optimization
9
• How it works?
• Forward: (convex) optimization solvers
• Backward: use property of optima (e.g., KKT condition)
• Application: Ridge/SVM classifier upon deep features
• Train a classifier upon 𝑘-shot features (ProtoNet uses 𝑘-means classifier)
[1] Amos & Kolter. OptNet: Differentiable Optimization as a Layer in Neural Networks. ICML 2017.
[2] Lee et al. Meta-Learning with Differentiable Convex Optimization. CVPR 2019.
Implicit layer – (Convex) optimization
10
• How it works?
• Forward: (convex) optimization solvers
• Backward: use property of optima (e.g., KKT condition)
• Application: Inner loop of MAML as a solution of regularized optimization
• No early stopping heuristic as original MAML (# of inner loops vary)
• Does not keep intermediate trajectory (property of optima)
→ Can apply arbitrary number of inner loops
[1] Amos & Kolter. OptNet: Differentiable Optimization as a Layer in Neural Networks. ICML 2017.
[2] Rajeswaran et al. Meta-Learning with Implicit Gradients. NeurIPS 2019.
Implicit layer – (Convex) optimization
11
• How it works?
• Forward: (convex) optimization solvers
• Backward: use property of optima (e.g., KKT condition)
• Application: Inner loop of MAML as a solution of regularized optimization
• Does not keep intermediate trajectory (property of optima)
• Efficient computation when # of inner loops are large
[1] Amos & Kolter. OptNet: Differentiable Optimization as a Layer in Neural Networks. ICML 2017.
[2] Rajeswaran et al. Meta-Learning with Implicit Gradients. NeurIPS 2019.
Implicit layer – (Convex) optimization
12
• How it works?
• Forward: (convex) optimization solvers
• Backward: use property of optima (e.g., KKT condition)
• Application: Inner loop of MAML as a solution of regularized optimization
• Does not keep intermediate trajectory (property of optima)
• Technical detail: meta-gradient is
where Jacobian is
[1] Amos & Kolter. OptNet: Differentiable Optimization as a Layer in Neural Networks. ICML 2017.
[2] Rajeswaran et al. Meta-Learning with Implicit Gradients. NeurIPS 2019.
Implicit layer – (Convex) optimization
13
• How it works?
• Forward: (convex) optimization solvers
• Backward: use property of optima (e.g., KKT condition)
• Application: Structured prediction by minimizing an energy function
• Solve an energy-based model (EBM) using deep feature 𝑓(𝑥)
• Output is 𝑦∗ = min
%
𝐸"(𝑦; 𝑓 𝑥 )
[1] Amos & Kolter. OptNet: Differentiable Optimization as a Layer in Neural Networks. ICML 2017.
[2] Belanger et al. End-to-End Learning for Structured Prediction Energy Networks. ICML 2017.
Implicit layer – Discrete optimization
14
• How it works?
• Forward: continuous relaxation (e.g., SDP solver for MAXSAT problem)
• Backward: use property of optima
[1] Wang et al. SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver. ICML 2019.
Implicit layer – Discrete optimization
15
• How it works?
• Forward: continuous relaxation (e.g., SDP solver for MAXSAT problem)
• Backward: use property of optima
• Application: Solve abstract reasoning problems
• Extract discrete latent code with VQ-VAE and apply SATNet
[1] Wang et al. SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver. ICML 2019.
[2] Yu et al. Abstract Reasoning via Logic-guided Generation. ICML Workshop 2021.
Implicit layer – Differential equation
16
• How it works?
• Forward: solve an ODE on
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡
Backward: solve another ODE on 𝑎 𝑡 =
#)
#$(&)
(adjoint method)
[1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018.
Implicit layer – Differential equation
17
• How it works?
• Forward: solve an ODE on
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡
Backward: solve another ODE on 𝑎 𝑡 =
#)
#$(&)
(adjoint method)
• Application: Irregular time series
• Handle arbitrary time inputs by continuous modeling (RNN needs discrete time)
• Hidden state over time:
[1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018.
[2] Rubanova et al. Latent ODEs for Irregularly-Sampled Time Series. NeurIPS 2019.
Implicit layer – Differential equation
18
• How it works?
• Forward: solve an ODE on
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡
Backward: solve another ODE on 𝑎 𝑡 =
#)
#$(&)
(adjoint method)
• Application: Irregular time series
• Handle arbitrary time inputs by continuous modeling (RNN needs discrete time)
• Better extrapolation:
[1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018.
[2] Rubanova et al. Latent ODEs for Irregularly-Sampled Time Series. NeurIPS 2019.
Implicit layer – Differential equation
19
• How it works?
• Forward: solve an ODE on
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡
Backward: solve another ODE on 𝑎 𝑡 =
#)
#$(&)
(adjoint method)
• Application: Irregular time series
• Also can be applied for continuous video modeling
[1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018.
[2] Toth et al. Hamiltonian Generative Networks. ICLR 2020.
Implicit layer – Differential equation
20
• How it works?
• Forward: solve an ODE on
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡
Backward: solve another ODE on 𝑎 𝑡 =
#)
#$(&)
(adjoint method)
• Application: Density estimation (normalizing flow)
• Normalizing flow models explicit density by change of variables
[1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018.
[2] Grathwohl. FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models. ICLR 2019.
Implicit layer – Differential equation
21
• How it works?
• Forward: solve an ODE on
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡
Backward: solve another ODE on 𝑎 𝑡 =
#)
#$(&)
(adjoint method)
• Application: Density estimation (normalizing flow)
• Normalizing flow models explicit density by change of variables
• It needs a specialized architectures to efficiently compute the Jacobian term det
#,
#$
• Example: planar flow
[1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018.
[2] Grathwohl. FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models. ICLR 2019.
Implicit layer – Differential equation
22
• How it works?
• Forward: solve an ODE on
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡
Backward: solve another ODE on 𝑎 𝑡 =
#)
#$(&)
(adjoint method)
• Application: Density estimation (normalizing flow)
• Normalizing flow models explicit density by change of variables
• Neural ODE can compute the Jacobian term efficiently
• Only compute the trace, instead of the determinant
[1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018.
[2] Grathwohl. FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models. ICLR 2019.
Implicit layer – Differential equation
23
• How it works?
• Forward: solve an ODE on
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡
Backward: solve another ODE on 𝑎 𝑡 =
#)
#$(&)
(adjoint method)
• Application: Density estimation (normalizing flow)
• Furthermore, neural SDE is current state-of-the-art for image generation
• Caveat: It is different from prior continuous flows and more related to diffusion models
[1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018.
[2] Song et al. Score-Based Generative Modeling through Stochastic Differential Equations. ICLR 2021.
Implicit layer – Fixed-point iteration
24
• How it works?
• Forward: apply layer 𝑧-.( = 𝑓"(𝑧-; 𝑧/012) until converge (output = 𝑧3)
• Backward: use the property of fixed-point 𝑓" 𝑧3 = 𝑧3
• Technical detail: SGD requires an inverse Jacobian,
which can be approximated by a solution of a linear system
[1] Bai et al. Deep Equilibrium Models. NeurIPS 2019.
Implicit layer – Fixed-point iteration
25
• How it works?
• Forward: apply layer 𝑧-.( = 𝑓"(𝑧-; 𝑧/012) until converge (output = 𝑧3)
• Backward: use the property of fixed-point 𝑓" 𝑧3 = 𝑧3
• Application: Infinite-depth network with a single layer
• Does not keep the intermediate activations → memory efficient
[1] Bai et al. Deep Equilibrium Models. NeurIPS 2019.
Implicit layer – Planning & control
26
• How it works?
• Forward: choose action via differentiable planning (e.g., value iteration, MCTS, MPC)
• Backward: rollout gradient through planning
• Application: Implicit planning on MDP (better prediction of action)
• Evaluate the action by running simulations (instead of directly using a Q-function)
• Need a transition matrix 𝑠, 𝑎 → 𝑠4, i.e., model-based RL
[1] Tamar et al. Value Iteration Networks. NeurIPS 2016.
[2] Amos et al. Differentiable MPC for End-to-end Planning and Control. NeurIPS 2018.
• Deep implicit layers are an interesting combination of algorithm and deep learning
• Lots of attention from ML community
• Value Iteration Network NeurIPS 2016 best paper
• Neural ODE NeurIPS 2018 best paper
• Score-based SDE ICLR 2021 best paper
• SATNet ICML 2019 honorable mention
• …and many orals & spotlights
• Many opportunities to utilize the ideas?
• MetaOptNet Apply OptNet for few-shot learning
• Logic-guied generation (LoGe) Apply SATNet for abstract reasoning
Take-home message
27
Thank you for listening! 😀

More Related Content

What's hot

Deep Learning 勉強会 (Chapter 7-12)
Deep Learning 勉強会 (Chapter 7-12)Deep Learning 勉強会 (Chapter 7-12)
Deep Learning 勉強会 (Chapter 7-12)
Ohsawa Goodfellow
 
[DL輪読会]Deep Learning 第5章 機械学習の基礎
[DL輪読会]Deep Learning 第5章 機械学習の基礎[DL輪読会]Deep Learning 第5章 機械学習の基礎
[DL輪読会]Deep Learning 第5章 機械学習の基礎
Deep Learning JP
 
PRML4.3.3
PRML4.3.3PRML4.3.3
PRML4.3.3
sleepy_yoshi
 
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
Deep Learning JP
 
ELBO型VAEのダメなところ
ELBO型VAEのダメなところELBO型VAEのダメなところ
ELBO型VAEのダメなところ
KCS Keio Computer Society
 
[DL輪読会]Conditional Neural Processes
[DL輪読会]Conditional Neural Processes[DL輪読会]Conditional Neural Processes
[DL輪読会]Conditional Neural Processes
Deep Learning JP
 
[DL輪読会]Variational Autoencoder with Arbitrary Conditioning
[DL輪読会]Variational Autoencoder with Arbitrary Conditioning[DL輪読会]Variational Autoencoder with Arbitrary Conditioning
[DL輪読会]Variational Autoencoder with Arbitrary Conditioning
Deep Learning JP
 
Deep learning勉強会20121214ochi
Deep learning勉強会20121214ochiDeep learning勉強会20121214ochi
Deep learning勉強会20121214ochi
Ohsawa Goodfellow
 
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
Deep Learning JP
 
[DL輪読会]Model-Based Reinforcement Learning via Meta-Policy Optimization
[DL輪読会]Model-Based Reinforcement Learning via Meta-Policy Optimization[DL輪読会]Model-Based Reinforcement Learning via Meta-Policy Optimization
[DL輪読会]Model-Based Reinforcement Learning via Meta-Policy Optimization
Deep Learning JP
 
ICML2018読み会: Overview of NLP / Adversarial Attacks
ICML2018読み会: Overview of NLP / Adversarial AttacksICML2018読み会: Overview of NLP / Adversarial Attacks
ICML2018読み会: Overview of NLP / Adversarial Attacks
Motoki Sato
 
[DL輪読会]Temporal Abstraction in NeurIPS2019
[DL輪読会]Temporal Abstraction in NeurIPS2019[DL輪読会]Temporal Abstraction in NeurIPS2019
[DL輪読会]Temporal Abstraction in NeurIPS2019
Deep Learning JP
 
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
Deep Learning JP
 
[DL輪読会]Meta Reinforcement Learning
[DL輪読会]Meta Reinforcement Learning[DL輪読会]Meta Reinforcement Learning
[DL輪読会]Meta Reinforcement Learning
Deep Learning JP
 
PRML Chapter 10
PRML Chapter 10PRML Chapter 10
PRML Chapter 10
Sunwoo Kim
 
PRML Chapter 8
PRML Chapter 8PRML Chapter 8
PRML Chapter 8
Sunwoo Kim
 
A summary on “On choosing and bounding probability metrics”
A summary on “On choosing and bounding probability metrics”A summary on “On choosing and bounding probability metrics”
A summary on “On choosing and bounding probability metrics”
Kota Matsui
 
[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...
[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...
[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...
Deep Learning JP
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
홍배 김
 
条件付き確率場の推論と学習
条件付き確率場の推論と学習条件付き確率場の推論と学習
条件付き確率場の推論と学習Masaki Saito
 

What's hot (20)

Deep Learning 勉強会 (Chapter 7-12)
Deep Learning 勉強会 (Chapter 7-12)Deep Learning 勉強会 (Chapter 7-12)
Deep Learning 勉強会 (Chapter 7-12)
 
[DL輪読会]Deep Learning 第5章 機械学習の基礎
[DL輪読会]Deep Learning 第5章 機械学習の基礎[DL輪読会]Deep Learning 第5章 機械学習の基礎
[DL輪読会]Deep Learning 第5章 機械学習の基礎
 
PRML4.3.3
PRML4.3.3PRML4.3.3
PRML4.3.3
 
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
 
ELBO型VAEのダメなところ
ELBO型VAEのダメなところELBO型VAEのダメなところ
ELBO型VAEのダメなところ
 
[DL輪読会]Conditional Neural Processes
[DL輪読会]Conditional Neural Processes[DL輪読会]Conditional Neural Processes
[DL輪読会]Conditional Neural Processes
 
[DL輪読会]Variational Autoencoder with Arbitrary Conditioning
[DL輪読会]Variational Autoencoder with Arbitrary Conditioning[DL輪読会]Variational Autoencoder with Arbitrary Conditioning
[DL輪読会]Variational Autoencoder with Arbitrary Conditioning
 
Deep learning勉強会20121214ochi
Deep learning勉強会20121214ochiDeep learning勉強会20121214ochi
Deep learning勉強会20121214ochi
 
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
 
[DL輪読会]Model-Based Reinforcement Learning via Meta-Policy Optimization
[DL輪読会]Model-Based Reinforcement Learning via Meta-Policy Optimization[DL輪読会]Model-Based Reinforcement Learning via Meta-Policy Optimization
[DL輪読会]Model-Based Reinforcement Learning via Meta-Policy Optimization
 
ICML2018読み会: Overview of NLP / Adversarial Attacks
ICML2018読み会: Overview of NLP / Adversarial AttacksICML2018読み会: Overview of NLP / Adversarial Attacks
ICML2018読み会: Overview of NLP / Adversarial Attacks
 
[DL輪読会]Temporal Abstraction in NeurIPS2019
[DL輪読会]Temporal Abstraction in NeurIPS2019[DL輪読会]Temporal Abstraction in NeurIPS2019
[DL輪読会]Temporal Abstraction in NeurIPS2019
 
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
 
[DL輪読会]Meta Reinforcement Learning
[DL輪読会]Meta Reinforcement Learning[DL輪読会]Meta Reinforcement Learning
[DL輪読会]Meta Reinforcement Learning
 
PRML Chapter 10
PRML Chapter 10PRML Chapter 10
PRML Chapter 10
 
PRML Chapter 8
PRML Chapter 8PRML Chapter 8
PRML Chapter 8
 
A summary on “On choosing and bounding probability metrics”
A summary on “On choosing and bounding probability metrics”A summary on “On choosing and bounding probability metrics”
A summary on “On choosing and bounding probability metrics”
 
[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...
[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...
[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Und...
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
 
条件付き確率場の推論と学習
条件付き確率場の推論と学習条件付き確率場の推論と学習
条件付き確率場の推論と学習
 

Similar to Deep Implicit Layers: Learning Structured Problems with Neural Networks

Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
Sangwoo Mo
 
DAA Notes.pdf
DAA Notes.pdfDAA Notes.pdf
DAA Notes.pdf
SauravPawar14
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
SaadMemon23
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
Daiki Tanaka
 
Intel Cluster Poisson Solver Library
Intel Cluster Poisson Solver LibraryIntel Cluster Poisson Solver Library
Intel Cluster Poisson Solver LibraryIlya Kryukov
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
ChenYiHuang5
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
StampedeCon
 
Relational knowledge distillation
Relational knowledge distillationRelational knowledge distillation
Relational knowledge distillation
NAVER Engineering
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
Balázs Hidasi
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
MLconf
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
Akash Goel
 
Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection
aftab alam
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
ananth
 
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Unai Lopez-Novoa
 
Use CNN for Sequence Modeling
Use CNN for Sequence ModelingUse CNN for Sequence Modeling
Use CNN for Sequence Modeling
Dongang (Sean) Wang
 
Fa18_P2.pptx
Fa18_P2.pptxFa18_P2.pptx
Fa18_P2.pptx
Md Abul Hayat
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with Performers
Joonhyung Lee
 
Loop parallelization & pipelining
Loop parallelization & pipeliningLoop parallelization & pipelining
Loop parallelization & pipelining
jagrat123
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
Jinwon Lee
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine Learning
Pranav Ainavolu
 

Similar to Deep Implicit Layers: Learning Structured Problems with Neural Networks (20)

Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
 
DAA Notes.pdf
DAA Notes.pdfDAA Notes.pdf
DAA Notes.pdf
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Intel Cluster Poisson Solver Library
Intel Cluster Poisson Solver LibraryIntel Cluster Poisson Solver Library
Intel Cluster Poisson Solver Library
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Relational knowledge distillation
Relational knowledge distillationRelational knowledge distillation
Relational knowledge distillation
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
 
Use CNN for Sequence Modeling
Use CNN for Sequence ModelingUse CNN for Sequence Modeling
Use CNN for Sequence Modeling
 
Fa18_P2.pptx
Fa18_P2.pptxFa18_P2.pptx
Fa18_P2.pptx
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with Performers
 
Loop parallelization & pipelining
Loop parallelization & pipeliningLoop parallelization & pipelining
Loop parallelization & pipelining
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine Learning
 

More from Sangwoo Mo

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation Learning
Sangwoo Mo
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated Data
Sangwoo Mo
 
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement Learning
Sangwoo Mo
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
Sangwoo Mo
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
Sangwoo Mo
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)
Sangwoo Mo
 
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)
Sangwoo Mo
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
Sangwoo Mo
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
Sangwoo Mo
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
Sangwoo Mo
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
Sangwoo Mo
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
Sangwoo Mo
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear Complexity
Sangwoo Mo
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit Gradients
Sangwoo Mo
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Sangwoo Mo
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
Sangwoo Mo
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-Learning
Sangwoo Mo
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
Sangwoo Mo
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation Survey
Sangwoo Mo
 
Neural Processes
Neural ProcessesNeural Processes
Neural Processes
Sangwoo Mo
 

More from Sangwoo Mo (20)

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation Learning
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated Data
 
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement Learning
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)
 
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear Complexity
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit Gradients
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-Learning
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation Survey
 
Neural Processes
Neural ProcessesNeural Processes
Neural Processes
 

Recently uploaded

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 

Recently uploaded (20)

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 

Deep Implicit Layers: Learning Structured Problems with Neural Networks

  • 1. Deep Implicit Layers Learning Structured Problems with Neural Networks 2021.09.06. KAIST ALIN-LAB Sangwoo Mo 1
  • 2. Can deep learning solve structured problems? 2 • Deep learning has shown remarkable success on perception (system 1) tasks = “3”
  • 3. Can deep learning solve structured problems? 3 • Deep learning has shown remarkable success on perception (system 1) tasks • Can deep learning also solve the complex reasoning (system 2) problems? = “3” Solve Sudoku
  • 4. Can deep learning solve structured problems? 4 • Structured reasoning problems require an algorithmic thinking
  • 5. Can deep learning solve structured problems? 5 • Deep implicit layers: design a layer to follow the algorithmic rule • Output of the layer is a solution of an algorithm, not a simple calculus 𝜎(𝑊𝑧 + 𝑏) • Forward: 𝑧∗ = Algorithm"(𝑧) • Backward: both #$∗ #$ and #$∗ #" should be computable
  • 6. Can deep learning solve structured problems? 6 • Deep implicit layers: design a layer to follow the algorithmic rule • Output of the layer is a solution of an algorithm, not a simple calculus 𝜎(𝑊𝑧 + 𝑏) • Forward: 𝑧∗ = Algorithm"(𝑧) • Backward: both #$∗ #$ and #$∗ #" should be computable • Why we need the implicit layers? • Reliable and generalizable prediction from interpretable rules[1,2] • It is importance to choose a proper architecture following the problem’s structure [1] Chen et al. Understanding Deep Architectures with Reasoning Layer. NeurIPS 2020. [2] Xu et al. How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks. ICLR 2021.
  • 7. Can deep learning solve structured problems? 7 • Deep implicit layers: design a layer to follow the algorithmic rule • Output of the layer is a solution of an algorithm, not a simple calculus 𝜎(𝑊𝑧 + 𝑏) • Forward: 𝑧∗ = Algorithm"(𝑧) • Backward: both #$∗ #$ and #$∗ #" should be computable • Examples of implicit layers • (Convex) optimization (application: meta-learning, structured prediction) • Discrete optimization (application: abstract reasoning) • Differential equation (application: sequential modeling, density estimation) • Fixed-point iteration (application: memory-efficient architectures) • Planning & control (application: model-based RL) • …and so on (e.g., ranking & sorting)
  • 8. Implicit layer – (Convex) optimization 8 • How it works? • Forward: (convex) optimization solvers • Backward: use property of optima (e.g., KKT condition) • Technical detail: OptNet considers a quadratic program then the gradients over parameters 𝑄, 𝑞, 𝐴, 𝑏, 𝐺, ℎ are given by [1] Amos & Kolter. OptNet: Differentiable Optimization as a Layer in Neural Networks. ICML 2017. [2] Agrawal et al. Differentiable Convex Optimization Layers. NeurIPS 2019.
  • 9. Implicit layer – (Convex) optimization 9 • How it works? • Forward: (convex) optimization solvers • Backward: use property of optima (e.g., KKT condition) • Application: Ridge/SVM classifier upon deep features • Train a classifier upon 𝑘-shot features (ProtoNet uses 𝑘-means classifier) [1] Amos & Kolter. OptNet: Differentiable Optimization as a Layer in Neural Networks. ICML 2017. [2] Lee et al. Meta-Learning with Differentiable Convex Optimization. CVPR 2019.
  • 10. Implicit layer – (Convex) optimization 10 • How it works? • Forward: (convex) optimization solvers • Backward: use property of optima (e.g., KKT condition) • Application: Inner loop of MAML as a solution of regularized optimization • No early stopping heuristic as original MAML (# of inner loops vary) • Does not keep intermediate trajectory (property of optima) → Can apply arbitrary number of inner loops [1] Amos & Kolter. OptNet: Differentiable Optimization as a Layer in Neural Networks. ICML 2017. [2] Rajeswaran et al. Meta-Learning with Implicit Gradients. NeurIPS 2019.
  • 11. Implicit layer – (Convex) optimization 11 • How it works? • Forward: (convex) optimization solvers • Backward: use property of optima (e.g., KKT condition) • Application: Inner loop of MAML as a solution of regularized optimization • Does not keep intermediate trajectory (property of optima) • Efficient computation when # of inner loops are large [1] Amos & Kolter. OptNet: Differentiable Optimization as a Layer in Neural Networks. ICML 2017. [2] Rajeswaran et al. Meta-Learning with Implicit Gradients. NeurIPS 2019.
  • 12. Implicit layer – (Convex) optimization 12 • How it works? • Forward: (convex) optimization solvers • Backward: use property of optima (e.g., KKT condition) • Application: Inner loop of MAML as a solution of regularized optimization • Does not keep intermediate trajectory (property of optima) • Technical detail: meta-gradient is where Jacobian is [1] Amos & Kolter. OptNet: Differentiable Optimization as a Layer in Neural Networks. ICML 2017. [2] Rajeswaran et al. Meta-Learning with Implicit Gradients. NeurIPS 2019.
  • 13. Implicit layer – (Convex) optimization 13 • How it works? • Forward: (convex) optimization solvers • Backward: use property of optima (e.g., KKT condition) • Application: Structured prediction by minimizing an energy function • Solve an energy-based model (EBM) using deep feature 𝑓(𝑥) • Output is 𝑦∗ = min % 𝐸"(𝑦; 𝑓 𝑥 ) [1] Amos & Kolter. OptNet: Differentiable Optimization as a Layer in Neural Networks. ICML 2017. [2] Belanger et al. End-to-End Learning for Structured Prediction Energy Networks. ICML 2017.
  • 14. Implicit layer – Discrete optimization 14 • How it works? • Forward: continuous relaxation (e.g., SDP solver for MAXSAT problem) • Backward: use property of optima [1] Wang et al. SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver. ICML 2019.
  • 15. Implicit layer – Discrete optimization 15 • How it works? • Forward: continuous relaxation (e.g., SDP solver for MAXSAT problem) • Backward: use property of optima • Application: Solve abstract reasoning problems • Extract discrete latent code with VQ-VAE and apply SATNet [1] Wang et al. SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver. ICML 2019. [2] Yu et al. Abstract Reasoning via Logic-guided Generation. ICML Workshop 2021.
  • 16. Implicit layer – Differential equation 16 • How it works? • Forward: solve an ODE on #$ & #& = 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫ &" &# 𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡 Backward: solve another ODE on 𝑎 𝑡 = #) #$(&) (adjoint method) [1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018.
  • 17. Implicit layer – Differential equation 17 • How it works? • Forward: solve an ODE on #$ & #& = 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫ &" &# 𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡 Backward: solve another ODE on 𝑎 𝑡 = #) #$(&) (adjoint method) • Application: Irregular time series • Handle arbitrary time inputs by continuous modeling (RNN needs discrete time) • Hidden state over time: [1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018. [2] Rubanova et al. Latent ODEs for Irregularly-Sampled Time Series. NeurIPS 2019.
  • 18. Implicit layer – Differential equation 18 • How it works? • Forward: solve an ODE on #$ & #& = 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫ &" &# 𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡 Backward: solve another ODE on 𝑎 𝑡 = #) #$(&) (adjoint method) • Application: Irregular time series • Handle arbitrary time inputs by continuous modeling (RNN needs discrete time) • Better extrapolation: [1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018. [2] Rubanova et al. Latent ODEs for Irregularly-Sampled Time Series. NeurIPS 2019.
  • 19. Implicit layer – Differential equation 19 • How it works? • Forward: solve an ODE on #$ & #& = 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫ &" &# 𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡 Backward: solve another ODE on 𝑎 𝑡 = #) #$(&) (adjoint method) • Application: Irregular time series • Also can be applied for continuous video modeling [1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018. [2] Toth et al. Hamiltonian Generative Networks. ICLR 2020.
  • 20. Implicit layer – Differential equation 20 • How it works? • Forward: solve an ODE on #$ & #& = 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫ &" &# 𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡 Backward: solve another ODE on 𝑎 𝑡 = #) #$(&) (adjoint method) • Application: Density estimation (normalizing flow) • Normalizing flow models explicit density by change of variables [1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018. [2] Grathwohl. FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models. ICLR 2019.
  • 21. Implicit layer – Differential equation 21 • How it works? • Forward: solve an ODE on #$ & #& = 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫ &" &# 𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡 Backward: solve another ODE on 𝑎 𝑡 = #) #$(&) (adjoint method) • Application: Density estimation (normalizing flow) • Normalizing flow models explicit density by change of variables • It needs a specialized architectures to efficiently compute the Jacobian term det #, #$ • Example: planar flow [1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018. [2] Grathwohl. FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models. ICLR 2019.
  • 22. Implicit layer – Differential equation 22 • How it works? • Forward: solve an ODE on #$ & #& = 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫ &" &# 𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡 Backward: solve another ODE on 𝑎 𝑡 = #) #$(&) (adjoint method) • Application: Density estimation (normalizing flow) • Normalizing flow models explicit density by change of variables • Neural ODE can compute the Jacobian term efficiently • Only compute the trace, instead of the determinant [1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018. [2] Grathwohl. FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models. ICLR 2019.
  • 23. Implicit layer – Differential equation 23 • How it works? • Forward: solve an ODE on #$ & #& = 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫ &" &# 𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡 Backward: solve another ODE on 𝑎 𝑡 = #) #$(&) (adjoint method) • Application: Density estimation (normalizing flow) • Furthermore, neural SDE is current state-of-the-art for image generation • Caveat: It is different from prior continuous flows and more related to diffusion models [1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018. [2] Song et al. Score-Based Generative Modeling through Stochastic Differential Equations. ICLR 2021.
  • 24. Implicit layer – Fixed-point iteration 24 • How it works? • Forward: apply layer 𝑧-.( = 𝑓"(𝑧-; 𝑧/012) until converge (output = 𝑧3) • Backward: use the property of fixed-point 𝑓" 𝑧3 = 𝑧3 • Technical detail: SGD requires an inverse Jacobian, which can be approximated by a solution of a linear system [1] Bai et al. Deep Equilibrium Models. NeurIPS 2019.
  • 25. Implicit layer – Fixed-point iteration 25 • How it works? • Forward: apply layer 𝑧-.( = 𝑓"(𝑧-; 𝑧/012) until converge (output = 𝑧3) • Backward: use the property of fixed-point 𝑓" 𝑧3 = 𝑧3 • Application: Infinite-depth network with a single layer • Does not keep the intermediate activations → memory efficient [1] Bai et al. Deep Equilibrium Models. NeurIPS 2019.
  • 26. Implicit layer – Planning & control 26 • How it works? • Forward: choose action via differentiable planning (e.g., value iteration, MCTS, MPC) • Backward: rollout gradient through planning • Application: Implicit planning on MDP (better prediction of action) • Evaluate the action by running simulations (instead of directly using a Q-function) • Need a transition matrix 𝑠, 𝑎 → 𝑠4, i.e., model-based RL [1] Tamar et al. Value Iteration Networks. NeurIPS 2016. [2] Amos et al. Differentiable MPC for End-to-end Planning and Control. NeurIPS 2018.
  • 27. • Deep implicit layers are an interesting combination of algorithm and deep learning • Lots of attention from ML community • Value Iteration Network NeurIPS 2016 best paper • Neural ODE NeurIPS 2018 best paper • Score-based SDE ICLR 2021 best paper • SATNet ICML 2019 honorable mention • …and many orals & spotlights • Many opportunities to utilize the ideas? • MetaOptNet Apply OptNet for few-shot learning • Logic-guied generation (LoGe) Apply SATNet for abstract reasoning Take-home message 27 Thank you for listening! 😀