An introduction to Normalizing Flows
Grigorios Chrysos
École polytechnique fédérale de Lausanne
January 19, 2021
Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 1 / 26
1 Introduction to Normalizing Flows
2 Taxonomy of Normalizing Flows
3 Practical considerations and limitations
4 Analyzing inverse problems
5 Generative modeling with Glow
Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 2 / 26
Goal of normalizing flows
Data space X Latent space Z
Inference
x ∼ p̂X
z = f (x)
⇒
Generation
z ∼ pZ
x = f −1
(z)
⇐
Figure: Perform both mappings with a single network.
Dinh, Laurent, Jascha Sohl-Dickstein, and Samy Bengio. ”Density estimation using real nvp.”, International Conference on
Learning Representations (ICLR) 2017.
Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 3 / 26
Formulation
Suppose we want to define a joint distribution over x ∈ RD.
Idea: Express x as a transformation f of a real vector u sampled from
pu(u):
x = f (u) with u ∼ pu(u) (1)
The transformation f must be invertible1 and both f and f −1 must
be differentiable.
1
Normalizing flows are often referred to as ‘Invertible neural networks’.
Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 4 / 26
Formulation
Then, using the change of variables:
px (x) = pu(u)|Jf (u)|−1
(2)
where | · | denotes the determinant and Jf the Jacobian matrix.
The Jacobian Jf is a D × D matrix; it is crucial to define a
transformation that has an easy-to-compute Jacobian (or its inverse).
The term |Jf (u)| is the relative change of volume in a small
neighborhood around u.
Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 5 / 26
Properties
Compositionality of fi : the composition of the two transformations f1
and f2 is denoted as f2 ◦ f1 and has:
|Jf2◦f1 (u)| = |Jf2 (f1(u))| · |Jf1 (u)| (3)
Expressive power: INNs can approximate any distribution px (x) under
mild conditions.
Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 6 / 26
1 Introduction to Normalizing Flows
2 Taxonomy of Normalizing Flows
3 Practical considerations and limitations
4 Analyzing inverse problems
5 Generative modeling with Glow
Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 7 / 26
Taxonomy
The normalizing flows are categorized as follows (Kobyzev, 2020):
Elementwise bijections.
Linear flows.
Coupling and autoregressive flows.
Residual flows.
Continuous-time flows2.
Kobyzev, Ivan, Simon Prince, and Marcus Brubaker. “Normalizing flows: An introduction and review of current methods.” IEEE
Transactions on Pattern Analysis and Machine Intelligence (2020).
2
Not covered in this presentation; included for thoroughness.
Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 8 / 26
Elementwise bijections
Let gθ : R → R be a bijective function, θ are (learnable) parameters
and x = (x1, . . . , xD)T .
Then, f (x) = (gθ(x1), . . . , gθ(xD))T is a bijection.
A different bijective function gθi
can be applied to each element xi .
Leaky RELU, or Softplus are two types of elementwise bijections.
Drawback: No correlation between different elements.
Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 9 / 26
Linear flows
Let Aθ ∈ RD×D an invertible matrix.
The linear transformation f (x) = Aθx captures correlations between
dimensions.
The following forms of Aθ have been used: diagonal, triangular, LU
decomposition, permutation matrix.
Drawback: Limited expressivity.
Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 10 / 26
Coupling flows
Let xA ∈ Rd and xB ∈ RD−d two disjoint partitions of x. Let
g : Rd → Rd be a bijective function, and φθ any arbitrary function.
The coupling flow realizes the mapping:
uA = g(xA, φθ(xB)), uB = xB (4)
where u = [uA, uB] is the output of the coupling layer.
The inverse of the coupling flow exists only if g is invertible. Then,
the inverse is:
xA = g−1
(uA, φθ(xB)), xB = uB (5)
Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 11 / 26
Coupling flows
Typically, coupling functions are applied elementwise.
Some of the coupling functions that have been used on g:
Affine coupling functions: g(x) = vx, where v 6= 0.
Splines (cubic, rational quadratic).
Piece-wise bijective coupling.
Drawbacks:
Reduced expressive power. However, a composition of D coupling
layers is a universal approximator.
Partitioning x.
Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 12 / 26
Autoregressive flows
Let gn : R → R be a bijective function with (learnable) parameters n.
The autoregressive flow is ut = gn (xt, φθ(x1:t−1)), where
x1:t = (x1, . . . , xt).
The Jacobian is triangular, however the computation of the inverse is
challenging.
Drawbacks:
Flow dependent on the order of the input variables.
Sequential computation; hard to parallelize.
Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 13 / 26
Residual flows
The flow is u = x + gθ(x).
Invertible only when gθ is constrained appropriately3.
Planar and radial flows are special cases of the residual flow:
Planar flow: f (x) = x + vg(wT
x + b), where g is a smooth,
elementwise non-linearity, v, w ∈ RD
.
Radial flow: f (x) = x + β/(a + ||x − x0||), where x0 is a selected point.
3
For instance, the block is invertible when the Lipschitz constant of the residual
block is smaller than 1.
Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 14 / 26
1 Introduction to Normalizing Flows
2 Taxonomy of Normalizing Flows
3 Practical considerations and limitations
4 Analyzing inverse problems
5 Generative modeling with Glow
Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 15 / 26
Practical considerations
Batch normalization has a diagonal Jacobian, so it can be used as a
transformation.
Linear transformations are often combined with other flows: coupling
layers are often used in conjunction with permutation.
A series of flows are typically learned:
x f1
←→ h1
f2
←→ h2
f3
←→ . . .
fN
←→ u.
Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 16 / 26
Limitations
Computational efficiency; for instance, GLOW (Kingma, 2018) uses
∼ 200M parameters and ∼ 600 convolution layers.
Inductive bias: modeling the invertible transformations.
Normalizing flows for non-Euclidean spaces.
Flows for discrete random variables.
Kingma, Durk P., and Prafulla Dhariwal. “Glow: Generative flow with invertible 1x1 convolutions.” In Advances in neural
information processing systems, pp. 10215-10224. 2018.
Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 17 / 26
1 Introduction to Normalizing Flows
2 Taxonomy of Normalizing Flows
3 Practical considerations and limitations
4 Analyzing inverse problems
5 Generative modeling with Glow
Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 18 / 26
Analyzing inverse problems
Let y ∈ RM be a measurement and x ∈ RD be the hidden variable in
the forward process y = s(x) with M < D.
Let z ∈ RK be a latent random variable, such that we want to learn a
function g with [y, z] = g(x).
Ardizzone, Lynton, Jakob Kruse, Sebastian Wirkert, Daniel Rahner, Eric W. Pellegrini, Ralf S. Klessen, Lena Maier-Hein,
Carsten Rother, and Ullrich Köthe. “Analyzing inverse problems with invertible neural networks.” ICLR 2019.
Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 19 / 26
Analyzing inverse problems - Coupling layers
They define the following coupling layers with inputs [uA, uB]:
vA = uA
exp(sB(uB))+tB(uB), vB = uB
exp(sA(vA))+tA(vA) (6)
The inverse can be easily defined given inputs [vA, vB]:
uB = (vB−tA(vA))
exp(−sA(vA)), uA = (vA−tB(uB))
exp(−sB(uB)),
(7)
where
is an elementwise product, and si , ti , i ∈ [A, B] can have
learnable parameters.
Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 20 / 26
Analyzing inverse problems - Applications
The authors use bidirectional training.
Application 1: Given the grasp position of a robotic arm, predict the
position of its joints.
Application 2: Functional state of the biological tissue. Observable:
reflectance of tissue. Hidden: oxygen saturation.
Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 21 / 26

An introduction on normalizing flows

  • 1.
    An introduction toNormalizing Flows Grigorios Chrysos École polytechnique fédérale de Lausanne January 19, 2021 Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 1 / 26
  • 2.
    1 Introduction toNormalizing Flows 2 Taxonomy of Normalizing Flows 3 Practical considerations and limitations 4 Analyzing inverse problems 5 Generative modeling with Glow Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 2 / 26
  • 3.
    Goal of normalizingflows Data space X Latent space Z Inference x ∼ p̂X z = f (x) ⇒ Generation z ∼ pZ x = f −1 (z) ⇐ Figure: Perform both mappings with a single network. Dinh, Laurent, Jascha Sohl-Dickstein, and Samy Bengio. ”Density estimation using real nvp.”, International Conference on Learning Representations (ICLR) 2017. Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 3 / 26
  • 4.
    Formulation Suppose we wantto define a joint distribution over x ∈ RD. Idea: Express x as a transformation f of a real vector u sampled from pu(u): x = f (u) with u ∼ pu(u) (1) The transformation f must be invertible1 and both f and f −1 must be differentiable. 1 Normalizing flows are often referred to as ‘Invertible neural networks’. Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 4 / 26
  • 5.
    Formulation Then, using thechange of variables: px (x) = pu(u)|Jf (u)|−1 (2) where | · | denotes the determinant and Jf the Jacobian matrix. The Jacobian Jf is a D × D matrix; it is crucial to define a transformation that has an easy-to-compute Jacobian (or its inverse). The term |Jf (u)| is the relative change of volume in a small neighborhood around u. Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 5 / 26
  • 6.
    Properties Compositionality of fi: the composition of the two transformations f1 and f2 is denoted as f2 ◦ f1 and has: |Jf2◦f1 (u)| = |Jf2 (f1(u))| · |Jf1 (u)| (3) Expressive power: INNs can approximate any distribution px (x) under mild conditions. Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 6 / 26
  • 7.
    1 Introduction toNormalizing Flows 2 Taxonomy of Normalizing Flows 3 Practical considerations and limitations 4 Analyzing inverse problems 5 Generative modeling with Glow Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 7 / 26
  • 8.
    Taxonomy The normalizing flowsare categorized as follows (Kobyzev, 2020): Elementwise bijections. Linear flows. Coupling and autoregressive flows. Residual flows. Continuous-time flows2. Kobyzev, Ivan, Simon Prince, and Marcus Brubaker. “Normalizing flows: An introduction and review of current methods.” IEEE Transactions on Pattern Analysis and Machine Intelligence (2020). 2 Not covered in this presentation; included for thoroughness. Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 8 / 26
  • 9.
    Elementwise bijections Let gθ: R → R be a bijective function, θ are (learnable) parameters and x = (x1, . . . , xD)T . Then, f (x) = (gθ(x1), . . . , gθ(xD))T is a bijection. A different bijective function gθi can be applied to each element xi . Leaky RELU, or Softplus are two types of elementwise bijections. Drawback: No correlation between different elements. Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 9 / 26
  • 10.
    Linear flows Let Aθ∈ RD×D an invertible matrix. The linear transformation f (x) = Aθx captures correlations between dimensions. The following forms of Aθ have been used: diagonal, triangular, LU decomposition, permutation matrix. Drawback: Limited expressivity. Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 10 / 26
  • 11.
    Coupling flows Let xA∈ Rd and xB ∈ RD−d two disjoint partitions of x. Let g : Rd → Rd be a bijective function, and φθ any arbitrary function. The coupling flow realizes the mapping: uA = g(xA, φθ(xB)), uB = xB (4) where u = [uA, uB] is the output of the coupling layer. The inverse of the coupling flow exists only if g is invertible. Then, the inverse is: xA = g−1 (uA, φθ(xB)), xB = uB (5) Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 11 / 26
  • 12.
    Coupling flows Typically, couplingfunctions are applied elementwise. Some of the coupling functions that have been used on g: Affine coupling functions: g(x) = vx, where v 6= 0. Splines (cubic, rational quadratic). Piece-wise bijective coupling. Drawbacks: Reduced expressive power. However, a composition of D coupling layers is a universal approximator. Partitioning x. Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 12 / 26
  • 13.
    Autoregressive flows Let gn: R → R be a bijective function with (learnable) parameters n. The autoregressive flow is ut = gn (xt, φθ(x1:t−1)), where x1:t = (x1, . . . , xt). The Jacobian is triangular, however the computation of the inverse is challenging. Drawbacks: Flow dependent on the order of the input variables. Sequential computation; hard to parallelize. Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 13 / 26
  • 14.
    Residual flows The flowis u = x + gθ(x). Invertible only when gθ is constrained appropriately3. Planar and radial flows are special cases of the residual flow: Planar flow: f (x) = x + vg(wT x + b), where g is a smooth, elementwise non-linearity, v, w ∈ RD . Radial flow: f (x) = x + β/(a + ||x − x0||), where x0 is a selected point. 3 For instance, the block is invertible when the Lipschitz constant of the residual block is smaller than 1. Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 14 / 26
  • 15.
    1 Introduction toNormalizing Flows 2 Taxonomy of Normalizing Flows 3 Practical considerations and limitations 4 Analyzing inverse problems 5 Generative modeling with Glow Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 15 / 26
  • 16.
    Practical considerations Batch normalizationhas a diagonal Jacobian, so it can be used as a transformation. Linear transformations are often combined with other flows: coupling layers are often used in conjunction with permutation. A series of flows are typically learned: x f1 ←→ h1 f2 ←→ h2 f3 ←→ . . . fN ←→ u. Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 16 / 26
  • 17.
    Limitations Computational efficiency; forinstance, GLOW (Kingma, 2018) uses ∼ 200M parameters and ∼ 600 convolution layers. Inductive bias: modeling the invertible transformations. Normalizing flows for non-Euclidean spaces. Flows for discrete random variables. Kingma, Durk P., and Prafulla Dhariwal. “Glow: Generative flow with invertible 1x1 convolutions.” In Advances in neural information processing systems, pp. 10215-10224. 2018. Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 17 / 26
  • 18.
    1 Introduction toNormalizing Flows 2 Taxonomy of Normalizing Flows 3 Practical considerations and limitations 4 Analyzing inverse problems 5 Generative modeling with Glow Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 18 / 26
  • 19.
    Analyzing inverse problems Lety ∈ RM be a measurement and x ∈ RD be the hidden variable in the forward process y = s(x) with M < D. Let z ∈ RK be a latent random variable, such that we want to learn a function g with [y, z] = g(x). Ardizzone, Lynton, Jakob Kruse, Sebastian Wirkert, Daniel Rahner, Eric W. Pellegrini, Ralf S. Klessen, Lena Maier-Hein, Carsten Rother, and Ullrich Köthe. “Analyzing inverse problems with invertible neural networks.” ICLR 2019. Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 19 / 26
  • 20.
    Analyzing inverse problems- Coupling layers They define the following coupling layers with inputs [uA, uB]: vA = uA
  • 21.
  • 22.
    exp(sA(vA))+tA(vA) (6) The inversecan be easily defined given inputs [vA, vB]: uB = (vB−tA(vA))
  • 23.
    exp(−sA(vA)), uA =(vA−tB(uB))
  • 24.
  • 25.
    is an elementwiseproduct, and si , ti , i ∈ [A, B] can have learnable parameters. Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 20 / 26
  • 26.
    Analyzing inverse problems- Applications The authors use bidirectional training. Application 1: Given the grasp position of a robotic arm, predict the position of its joints. Application 2: Functional state of the biological tissue. Observable: reflectance of tissue. Hidden: oxygen saturation. Grigorios Chrysos An introduction to Normalizing Flows January 19, 2021 21 / 26