Deep Implicit Layers: Learning Structured Problems with Neural Networks

Deep Implicit Layers
Learning Structured Problems with Neural Networks
2021.09.06.
KAIST ALIN-LAB
Sangwoo Mo
1

Can deep learning solve structured problems?
2
• Deep learning has shown remarkable success on perception (system 1) tasks
= “3”

3
• Deep learning has shown remarkable success on perception (system 1) tasks
• Can deep learning also solve the complex reasoning (system 2) problems?
= “3”
Solve Sudoku

4
• Structured reasoning problems require an algorithmic thinking

5
• Deep implicit layers: design a layer to follow the algorithmic rule
• Output of the layer is a solution of an algorithm, not a simple calculus 𝜎(𝑊𝑧 + 𝑏)
• Forward: 𝑧∗ = Algorithm"(𝑧)
• Backward: both
#$∗
#$
and
#$∗
#"
should be computable

6
• Backward: both
#$∗
#$
and
#$∗
#"
• Why we need the implicit layers?
• Reliable and generalizable prediction from interpretable rules[1,2]
• It is importance to choose a proper architecture following the problem’s structure
[1] Chen et al. Understanding Deep Architectures with Reasoning Layer. NeurIPS 2020.
[2] Xu et al. How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks. ICLR 2021.

7
• Backward: both
#$∗
#$
and
#$∗
#"
• Examples of implicit layers
• (Convex) optimization (application: meta-learning, structured prediction)
• Discrete optimization (application: abstract reasoning)
• Differential equation (application: sequential modeling, density estimation)
• Fixed-point iteration (application: memory-efficient architectures)
• Planning & control (application: model-based RL)
• …and so on (e.g., ranking & sorting)

Implicit layer – (Convex) optimization
8
• How it works?
• Forward: (convex) optimization solvers
• Backward: use property of optima (e.g., KKT condition)
• Technical detail: OptNet considers a quadratic program
then the gradients over parameters 𝑄, 𝑞, 𝐴, 𝑏, 𝐺, ℎ are given by
[1] Amos & Kolter. OptNet: Differentiable Optimization as a Layer in Neural Networks. ICML 2017.
[2] Agrawal et al. Differentiable Convex Optimization Layers. NeurIPS 2019.

9
• How it works?
• Application: Ridge/SVM classifier upon deep features
• Train a classifier upon 𝑘-shot features (ProtoNet uses 𝑘-means classifier)
[2] Lee et al. Meta-Learning with Differentiable Convex Optimization. CVPR 2019.

10
• How it works?
• Application: Inner loop of MAML as a solution of regularized optimization
• No early stopping heuristic as original MAML (# of inner loops vary)
• Does not keep intermediate trajectory (property of optima)
→ Can apply arbitrary number of inner loops
[2] Rajeswaran et al. Meta-Learning with Implicit Gradients. NeurIPS 2019.

11
• How it works?
• Efficient computation when # of inner loops are large

12
• How it works?
• Technical detail: meta-gradient is
where Jacobian is

13
• How it works?
• Application: Structured prediction by minimizing an energy function
• Solve an energy-based model (EBM) using deep feature 𝑓(𝑥)
• Output is 𝑦∗ = min
%
𝐸"(𝑦; 𝑓 𝑥 )
[2] Belanger et al. End-to-End Learning for Structured Prediction Energy Networks. ICML 2017.

Implicit layer – Discrete optimization
14
• How it works?
• Forward: continuous relaxation (e.g., SDP solver for MAXSAT problem)
• Backward: use property of optima
[1] Wang et al. SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver. ICML 2019.

Implicit layer – Discrete optimization
15
• How it works?
• Forward: continuous relaxation (e.g., SDP solver for MAXSAT problem)
• Backward: use property of optima
• Application: Solve abstract reasoning problems
• Extract discrete latent code with VQ-VAE and apply SATNet
[1] Wang et al. SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver. ICML 2019.
[2] Yu et al. Abstract Reasoning via Logic-guided Generation. ICML Workshop 2021.

Implicit layer – Differential equation
16
• How it works?
• Forward: solve an ODE on
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡
Backward: solve another ODE on 𝑎 𝑡 =
#)
#$(&)
(adjoint method)
[1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018.

17
• How it works?
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
#)
#$(&)
(adjoint method)
• Application: Irregular time series
• Handle arbitrary time inputs by continuous modeling (RNN needs discrete time)
• Hidden state over time:
[2] Rubanova et al. Latent ODEs for Irregularly-Sampled Time Series. NeurIPS 2019.

18
• How it works?
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
#)
#$(&)
(adjoint method)
• Handle arbitrary time inputs by continuous modeling (RNN needs discrete time)
• Better extrapolation:
[2] Rubanova et al. Latent ODEs for Irregularly-Sampled Time Series. NeurIPS 2019.

19
• How it works?
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
#)
#$(&)
(adjoint method)
• Also can be applied for continuous video modeling
[2] Toth et al. Hamiltonian Generative Networks. ICLR 2020.

20
• How it works?
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
#)
#$(&)
(adjoint method)
• Application: Density estimation (normalizing flow)
• Normalizing flow models explicit density by change of variables
[2] Grathwohl. FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models. ICLR 2019.

21
• How it works?
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
#)
#$(&)
(adjoint method)
• It needs a specialized architectures to efficiently compute the Jacobian term det
#,
#$
• Example: planar flow

22
• How it works?
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
#)
#$(&)
(adjoint method)
• Neural ODE can compute the Jacobian term efficiently
• Only compute the trace, instead of the determinant

23
• How it works?
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
#)
#$(&)
(adjoint method)
• Furthermore, neural SDE is current state-of-the-art for image generation
• Caveat: It is different from prior continuous flows and more related to diffusion models
[2] Song et al. Score-Based Generative Modeling through Stochastic Differential Equations. ICLR 2021.

Implicit layer – Fixed-point iteration
24
• How it works?
• Forward: apply layer 𝑧-.( = 𝑓"(𝑧-; 𝑧/012) until converge (output = 𝑧3)
• Backward: use the property of fixed-point 𝑓" 𝑧3 = 𝑧3
• Technical detail: SGD requires an inverse Jacobian,
which can be approximated by a solution of a linear system
[1] Bai et al. Deep Equilibrium Models. NeurIPS 2019.

Implicit layer – Fixed-point iteration
25
• How it works?
• Forward: apply layer 𝑧-.( = 𝑓"(𝑧-; 𝑧/012) until converge (output = 𝑧3)
• Backward: use the property of fixed-point 𝑓" 𝑧3 = 𝑧3
• Application: Infinite-depth network with a single layer
• Does not keep the intermediate activations → memory efficient
[1] Bai et al. Deep Equilibrium Models. NeurIPS 2019.

Implicit layer – Planning & control
26
• How it works?
• Forward: choose action via differentiable planning (e.g., value iteration, MCTS, MPC)
• Backward: rollout gradient through planning
• Application: Implicit planning on MDP (better prediction of action)
• Evaluate the action by running simulations (instead of directly using a Q-function)
• Need a transition matrix 𝑠, 𝑎 → 𝑠4, i.e., model-based RL
[1] Tamar et al. Value Iteration Networks. NeurIPS 2016.
[2] Amos et al. Differentiable MPC for End-to-end Planning and Control. NeurIPS 2018.

• Deep implicit layers are an interesting combination of algorithm and deep learning
• Lots of attention from ML community
• Value Iteration Network NeurIPS 2016 best paper
• Neural ODE NeurIPS 2018 best paper
• Score-based SDE ICLR 2021 best paper
• SATNet ICML 2019 honorable mention
• …and many orals & spotlights
• Many opportunities to utilize the ideas?
• MetaOptNet Apply OptNet for few-shot learning
• Logic-guied generation (LoGe) Apply SATNet for abstract reasoning
Take-home message
27
Thank you for listening! 😀

Deep Implicit Layers: Learning Structured Problems with Neural Networks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deep Implicit Layers: Learning Structured Problems with Neural Networks

Similar to Deep Implicit Layers: Learning Structured Problems with Neural Networks (20)

More from Sangwoo Mo

More from Sangwoo Mo (20)

Recently uploaded

Recently uploaded (20)

Deep Implicit Layers: Learning Structured Problems with Neural Networks