Progressively applying Gaussian noise transforms complex data distributions to approximately Gaussian. Reversing this dynamic defines a generative model. When the forward noising process is given by a Stochastic Differential Equation (SDE), Song et al. (2021) demonstrate how the time inhomogeneous drift of the associated reverse-time SDE may be estimated using score-matching. A limitation of this approach is that the forward-time SDE must be run for a sufficiently long time for the final distribution to be approximately Gaussian. In contrast, solving the Schrödinger Bridge problem (SB), i.e. an entropy-regularized optimal transport problem on path spaces, yields diffusions which generate samples from the data distribution in finite time. We present Diffusion SB (DSB), an original approximation of the Iterative Proportional Fitting (IPF) procedure to solve the SB problem, and provide theoretical analysis along with generative modeling experiments. The first DSB iteration recovers the methodology proposed by Song et al. (2021), with the flexibility of using shorter time intervals, as subsequent DSB iterations reduce the discrepancy between the final-time marginal of the forward (resp. backward) SDE with respect to the prior (resp. data) distribution. Beyond generative modeling, DSB offers a widely applicable computational optimal transport tool as the continuous state-space analogue of the popular Sinkhorn algorithm (Cuturi, 2013).
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
Diffusion Schrödinger bridges for score-based generative modeling
1. Diffusion Schrödinger bridges
for score-based generative modeling
Jeremy Heng
Joint work with Valentin De Bortoli, James Thornton and Arnaud Doucet
ESSEC Business School
ICSA China - 1 July 2022
1 / 20
2. Generative Modeling and Score-Based Generative Models
Diffusion Models Beat GANs on Image Synthesis - OpenAI, 2021
Over recent years, massive advances in generative modeling driven
by VAEs (Kingma & Welling, 2014), GANs (Goodfellow et al., 2014),
autoregressive models (van den Oord et al., 2016).
Score-based generative models aka denoising diffusion models
(Ho et al., 2020; Song et al., 2021) provide SOTA results in a large
number of domains.
2 / 20
3. One basic idea: ancestral sampling
From Song et al., ICLR 2021
Consider a Markov chain with X0 ∼ p0 and Xk+1 ∼ pk+1|k(·|Xk) then
p(x0:N ) = p0(x0)
QN−1
k=0 pk+1|k(xk+1|xk).
Denote by pk the marginal of Xk satisfying
pk(xk) =
R
pk|k−1(xk|xk−1)pk−1(xk−1)dxk−1.
Backward decomposition (pk|k+1 obtained with Bayes’ rule)
p(x0:N ) = pN (xN )
QN−1
k=0 pk|k+1(xk|xk+1).
In particular, one can sample from p(x0:N ) by ancestral sampling
Sample XN ∼ pN (·) then Xk ∼ pk|k+1(·|Xk+1) for k ∈ {N − 1, . . . , 1}.
3 / 20
4. Generative Modeling with ancestral sampling
From Song et al., ICLR 2021
Let p0 = pdata and set pk+1|k such that pN ≈ pref for N 1 where pref
is a “reference” easy-to-sample density.
Usual choice pk+1|k(x0
|x) = N(x0
; αx, (1 − α2
)Id) such that
pN (x) ≈ pref(x) where pref = N(x; 0d, Id) for N large enough.
Use ancestral sampling by replacing pN by pref
Sample XN ∼ pref(·) then Xk ∼ pk|k+1(·|Xk+1) for k ∈ {N − 1, . . . , 1}
Key Problem: Not only one needs forward transitions pk+1|k and N
large enough such that pN (x) ≈ pref(x) but also needs to
approximate the backward transitions pk|k+1.
4 / 20
5. Approximating Backward Transitions
We restrict ourselves to discretized Ornstein-Uhlenbeck processes
pk+1|k(xk+1|xk) = N(xk+1; αxk, (1 − α2
)Id),
(α 0 is close to 1)
Using a Taylor expansion we get
pk|k+1(xk|xk+1) = pk+1|k(xk+1|xk) exp[log pk(xk) − log pk+1(xk+1)]
≈ N(xk; (2 − α)xk+1 + (1 − α2
) ∇ log pk+1(xk+1)
| {z }
Score
, (1 − α2
)Id).
The score is not available but using that
pk+1(xk+1) =
R
p0(x0)pk+1|0(xk+1|x0)dx0, we get that
∇ log pk+1(xk+1) = EX0∼p0|k+1
[∇xk+1
log pk+1|0(xk+1|X0)].
5 / 20
6. Estimating the Scores using Score Matching and Sampling
Conditional expectation → Regression problem
sk+1 = arg mins Ep0,k+1
[||s(Xk+1) − ∇xk+1
log pk+1|0(Xk+1|X0)||2
].
In practice, we restrict ourselves to neural networks and estimate all
scores simultaneously i.e. sθ? (k, xk) ≈ ∇ log pk(xk) where
θ?
≈ arg minθ
PN
k=1 Ep0,k
[||sθ(k, Xk) − ∇xk
log pk|0(Xk|X0)||2
],
Generate samples from the backward process using XN ∼ pref and
the recursion
Xk = (2 − α)Xk+1 + (1 − α2
)sθ? (k + 1, Xk+1) +
√
1 − α2Zk+1.
Code available (JAX and Pytorch):
https://github.com/yang-song/score_sde
6 / 20
7. From Discrete to Continuous-Time
The Markov chain is an Euler discretization of the
Ornstein-Uhlenbeck
dXt = −βXtdt +
√
2dBt, X0 ∼ pdata.
(β 0 is a parameter, pref = N(0, 1/β Id)
The reverse-time process (Yt)t∈[0,T] = (XT−t)t∈[0,T] satisfies
dYt = {βYt + 2∇ log pT−t(Yt)}dt +
√
2dBt, Y0 ∼ pT
and the generative model is
dYt = {βYt + 2sθ? (T − t, Yt)}dt +
√
2dBt, Y0 ∼ pref .
7 / 20
8. From Discrete to Continuous-Time
Convergence of diffusion models (De Bortoli et al., 2021)
Assume there exists M ≥ 0 such that for any t ∈ [0, T] and x ∈ Rd
||sθ? (t, x) − ∇ log pt(x)|| ≤ M,
with sθ? ∈ C([0, T] × Rd
, Rd
) and regularity conditions on pdata and its
gradients.
Then there exist Bβ, Cβ, Dβ ≥ 0 s.t. for any N ∈ N and {γk}N
k=1 the
following hold:
||L(X0) − pdata||TV ≤ Bβ exp[−β1/2
T] + Cβ(M + γ̄1/2
) exp[DβT].
where T =
PN
k=1 γk and γ̄ = supk∈{1,...,N} γk ({γk}N
k=1 sequence of
stepsizes in the Euler-Maruyama discretization).
Take-home message: the “mixing time” of the reversal is entirely
given by the forward process. The bottleneck is not the mixing of the
chain but the approximation of the drift.
8 / 20
9. Practical Limitations
Not enough stepsizes lead to poor approximation (the
Ornstein-Uhlenbeck process does not mix fast enough).
Illustration of failure: N is too small so pN is very different from pref.
This harms the quality of the reconstruction for the time-reversal.
Our contribution: “iterating” diffusion models to force the correct
marginal distributions.
9 / 20
10. Revisiting Generative Modeling using Schrödinger Bridges
The Schrödinger Bridge problem: consider a base process p(x0:N ),
find π?
(x0:N ) such that
π?
= arg min{KL(π||p) : π0 = pdata, πN = pref}.
If π?
is available: XN ∼ pref, then Xk ∼ π?
k|k+1(·|Xk+1) for
k ∈ {N − 1, . . . , 0}.
We have π?
(x0:N ) = πs,?
(x0, xN )p(x1:N−1|x0, xN ) where
πs,?
= arg min{−Eπs [log pN|0(XN |X0)]−H(πs
) : πs
0 = pdata, πs
N = pref}
and, if pN|0(xN |x0) = N(xN ; x0, σ2
), then
πs,?
= arg min{Eπs [||X0 − XN ||2
] − 2σ2
H(πs
) : πs
0 = pdata, πs
N = pref}.
This is regularized Wasserstein 2 cost, i.e. σ → 0 implies that πs,?
converges to the optimal transport plan (Mikami, 2004).
10 / 20
11. Solving the Schrödinger Bridge Problem
The SB problem can be solved using Iterative Proportional Fitting
(IPF) (Fortet, 1940; Kullback, 1968), i.e. set π0
= p and for n ≥ 1
π2n+1
= arg min{KL(π||π2n
), πN = pref},
π2n+2
= arg min{KL(π||π2n+1
), π0 = pdata}.
limn→+∞ πn
= π?
under regularity conditions (Ruschendorf, 1995;
Léger, 2021; De Bortoli et al., 2021).
Explicit solution of the first IPF step
KL(π||π0
) = KL(πN ||pN ) + EπN
[KL(π|N ||p|N )]
Therefore,
π1
(x0:N ) = pref(xN )p(x0:N−1|xN )
= pref(xN )
Q0
k=N−1pk|k+1(xk|xk+1)
Take-home message: Approximation to first iteration of IPF
corresponds to current Score-Based Generative models.
11 / 20
12. Solving the Schrödinger Bridge Problem
The second iteration requires solving
π2
= arg min{KL(π||π1
), π0 = pdata}.
Therefore,
π2
(x0:N ) = pdata(x0)π1
(x1:N |x0)
= pdata(x0)
QN
k=1π1
k+1|k(xk+1|xk)
On an algorithmic level:
I IPF1: the time-reversal of the forward process π0
= p is
initialized by pref at time N to define the backward process π1
.
I IPF2: the time-reversal of the backward process π1
is initialized
by pdata at time 0 to define the forward process π2
.
I IPF3: the time-reversal of the forward process π2
is initialized
by pref at time N to define the backward process π3
.
I ...
12 / 20
13. Continuous-Time IPF
IPF can be formulated in continuous time
Π?
= arg min{KL(Π||P) : Π ∈ P(C), Π0 = pdata, ΠT = pref}.
Similarly, we define the IPF (Πn
) recursively Π0
= P using
Π2n+1
= arg min{KL(Π||Π2n
) : Π ∈ P(C), ΠT = pref},
Π2n+2
= arg min{KL(Π||Π2n+1
) : Π ∈ P(C), Π0 = pdata}.
Under regularity conditions, then
(Π2n+1
)R
→ dY2n+1
t = bn
T−t(Y2n+1
t )dt +
√
2dBt, Y2n+1
0 ∼ pref,
Π2n+2
→ dX2n+2
t = f n+1
t (X2n+2
t )dt +
√
2dBt, X2n+2
0 ∼ pdata,
where
bn
t (x) = −f n
t (x) + 2∇ log pn
t (x),
f n+1
t (x) = −bn
t (x) + 2∇ log qn
t (x),
with f 0
t (x) = f (x), and pn
t , qn
t the densities of Π2n
t and Π2n+1
t .
13 / 20
15. Diffusion Schrödinger Bridge: 2D example
Diffusion Schrödinger Bridge (DSB) gives a solution to the “small
time problem”.
Approximation of Optimal Transport.
15 / 20
18. Applications: Datasets Interpolation
First row: Swiss-roll to S-curve (2D). Step 9 of DSB with T = 1
(N = 50). From left to right: t = 0, 0.4, 0.6, 1. Second row: EMNIST to
MNIST. Step 10 of DSB with T = 1.5 (N = 30). From left to right:
t = 0, 0.4, 1.25, 1.5.
18 / 20
19. Discussion
Quick summary
I Theoretical results for denoising diffusion models.
I Generative modeling can be reformulated as a Schrödinger
Bridge problem.
I Diffusion Schrödinger Bridge approximates its solution using
(discretized) forward-backward diffusions and score matching
ideas.
19 / 20
20. References
V. De Bortoli, J. Thornton, J. Heng A. Doucet, Diffusion Schrödinger
bridge with applications to score-based generative modeling. NeurIPS
2021.
V. De Bortoli, G. Deligiannidis A. Doucet, Quantitative uniform
stability of the iterative proportional fitting procedure.
arXiv:2108.08129.
J. Ho, A. Jain P. Abbeel, Denoising diffusion probabilistic models.
NeurIPS 2020.
Y. Song, J. Sohl-Dickstein, D.P. Kingma, A.Kumar, S. Ermon B.
Poole, Score-based generative modeling through stochastic differential
equations, ICLR 2021.
20 / 20