An Introduction to Optimal Transport

An Introduction
to Optimal Transport
Gabriel Peyré

www.numerical-tours.com

Statistical Image Models
Colors distribution: ) each pixel
Source image (X point in R3

Source image after color transfer

Style image (Y )

J. Rabin Wasserstein Regularization


Optimal transport framework Sliced Wasserstein projection Applications

Application to Color Transfer


Application to Color Transfer Sliced Wasserstein projection color style
Source image after of X to transfer
image color statistics Y

Style image (Y )

Source image (X ) Sliced Wasserstein projection of X to style

Input image
Source image (X ) Modiﬁed color transfer
Source image after
image
Style image (Y )






Application to Color Transfer Sliced Wasserstein projection color style
Source image after of X to transfer

Style image (Y )

Source image (X ) Sliced Wasserstein projection of X to style

Input image
Source image (X ) Modiﬁed color transfer
Source image after
image
Style image (Y )
Texture synthesis
Other applications: J. Rabin Wasserstein Regularization

Texture segmentation

Discrete Distributions
N 1
Discrete measure: µ = pi Xi Xi Rd pi = 1
i=0 i

N 1
i=0 i

Point cloud
Constant weights: pi = 1/N .

Xi

Quotient space:
RN d / N

N 1
i=0 i

Point cloud Histogram
Constant weights: pi = 1/N . Fixed positions Xi (e.g. grid)

Xi

Quotient space: A ne space:
RN d / N {(pi )i i pi = 1}

From Images to Statistics
Source image (X )

Discretized image f RN d
fi Rd = R3
N = #pixels, d = #colors.
Source image

Style image (Y )

J. Rabin Wasserstein Regularizatio

From Images to Statistics Source image (X )

fi Rd = R3
Source image

Disclamers: images are not distributions. image (Y )
Style


Needs an estimator: f µf
Modify f by controlling µf .


fi Rd = R3
Source image

Style


Xi

Point cloud discretization: µf = fi
i


fi Rd = R3
Source image

Style


Xi

Point cloud discretization: µf = fi
i

Histogram discretization: µf = pi Xi
i
1
Parzen windows: pi = (xi fj )
Zf j

Overview

• Discrete Optimal Transport

• Continuous Optimal Transport

• Displacement Interpolation

Optimal Assignments
N
1
Discrete distributions: µX = Xi
N i=1 µY
µX

Optimal Assignments
N
1
N i=1 µY
µX
Optimal assignment:
⇥
⇥ argmin ||Xi Y (i) ||p Y (i)
N i Xi

Optimal Assignments
N
1
N i=1 µY
µX
Optimal assignment:
⇥
N i Xi

Wasserstein distance: Wp (µX , µY )p = ||Xi Y (i) ||p
i
Metric on the space of distributions.

Optimal Assignments
N
1
N i=1 µY
µX
Optimal assignment:
⇥
N i Xi

Wasserstein distance: Wp (µX , µY )p = ||Xi Y (i) ||p
i
Metric on the space of distributions.

Projection on statistical constraints: C = {f µf = µY }
ProjC (f ) = Y

Computing Transport Distances

Explicit solution for 1D distribution (e.g. grayscale images):
Xi
Yi
sorting the values, O(N log(N )) operations.


Xi
Yi

Higher dimensions: combinatorial optimization methods
Hungarian algorithm, auctions algorithm, etc.
O(N 5/2 log(N )) operations.
intractable for imaging problems.


Xi
Yi

Higher dimensions: combinatorial optimization methods
Hungarian algorithm, auctions algorithm, etc.
O(N 5/2 log(N )) operations.
intractable for imaging problems.

Arbitrary distributions: µ= pi Xi ⇥= qi Yi
i i
Wp (µ, ) solution of a linear program.
p

Coupling Matrices
N1
weighted distributions µ= i=1 pi Xi
Extension to:
arbitrary number of points =
N2
i=1 qi Yi

Coupling Matrices
N1
Extension to:
N2
i=1 qi Yi

If N1 = N2 , permutation matrix: P = P = ( i (j) )i,j

||Xi Y (i) ||p = Pi,j ||Xi Yj ||p
i i,j

Coupling Matrices
N1
Extension to:
N2
i=1 qi Yi


||Xi Y (i) ||p = Pi,j ||Xi Yj ||p
i i,j
Probabilistic coupling:
µ, = P RN1 N2
P 0, P 1 = p, P 1 = q
p

q

Coupling Matrices
N1
Extension to:
N2
i=1 qi Yi


||Xi Y (i) ||p = Pi,j ||Xi Yj ||p
i i,j
Probabilistic coupling:
µ, = P RN1 N2
P 0, P 1 = p, P 1 = q
p
Deﬁned even if N1 = N2 .
Takes into account weights.
Linear objective.

q

Kantorovitch Formulation
Linear programming (Kantorovitch): W (µ, )p = P , C
P argmin P, C = Ci,j Pi,j
P µ,
p
i,j

q

Optimal coupling P :

p q

P µ,
p
i,j

P

extremal points: q
P,
P C
=c
st Optimal coupling P :

p q

P µ,
p
i,j

P

extremal points: q
P,
P C
=c
st Optimal coupling P :

Theorem: If pi = qi = 1/N ,
N, P = P
p q

Our experiments also show that fixed-point precision further s
Optimization Codes
up the computation. We observed that the value of the final
port cost is less accurate because of the limited precision, b
the particle pairing that produces the actual interpolation s
Discrete optimal transport: unchanged. We used the fixed point method to ge
remains
the results presented in this paper. The results of the perfor
P argmin P, Cstudy are also of broader interest, as current EMD image re
= Ci,j Pi,j
P or color transfer techniques rely on slower solvers [Rubner
µ, i,j
2000; Kanters et al. 2003; Morovic and Sun 2003].

2

Linear program:
10 Network-Simplex-(fixed-point)
Network-Simplex-(double-prec.)
Transport-Simplex
y-=-α-x2

Time-in-seconds
Interior points: slow.
y-=-β-x3
0
10

Network simplex. 10

Transportation simplex.
10 [Bonneel et al. 2011]
1 2 3 4
10 10 10 10
Problem-size-:-number-of-bins-per-histogram

Figure 6: Log-log plot of the running times of different so
The network simplex behaves as a O(n2 ) algorithm in pr
Block search pivoting strategythe transportand O’Neill 1991]
whereas [Kelly simplex runs in O(n3 ).

pplication to Color Transfer
Color Histogram Equalization
1
Input color images: fi RN 3 . projectioniof= to style
Sliced Wasserstein
⇥ X
N x
fi (x)


Source image (X )

f1 f0 Sliced Wasserstein projec

f0

µ1 image (Y )
Style Source image (X )
µ0

1
Sliced Wasserstein
⇥ X
N x
fi (x)
Optimal assignement: min ||f0 f1 ⇥ ||
N


Source image (X )


f0

µ1 image (Y )
µ0

1
Sliced Wasserstein
⇥ X
N x
fi (x)
N

Transport: T : f0 (x) R3 f1 ( (i)) R3

Source image (X )


f0

µ1 image (Y )
µ0
T

1
Sliced Wasserstein
⇥ X
N x
fi (x)
N

Transport: T : f0 (x) R3Application to Color Transfer R3
f1 ( (i))

˜
Equalization: ) f0 = T (f0 ) ˜
f0 = f1 Sliced Wasserstein projection of X to sty
Source image (X image color statistics Y

f1 f0 T (f )
0
Sliced Wasserstein projec
Source image (X )
T
f0

µ1 image (Y )
µ0 Source image after color transfer
µ1
Style image (Y )

T J. Rabin Wasserstein Regularization

ork, Measure Preserving Maps
ica-
d ofDistributions µ0 , µ1 on Rk .
ase.
eeds
ans-
that
eme
rate
ance
eval
t al. µ0 µ1

ica-
ase.
eeds
Mass preserving map T : Rk Rk .
ans-
that µ1 = T µ0 where (T µ0 )(A) = µ0 (T (A))
1

eme
rate
ance x T (x)
eval
t al. µ0 µ1

ica-
ase.
eeds
Mass preserving map T : Rk Rk .
ans-
that µ1 = T µ0 where (T µ0 )(A) = µ0 (T (A))
1

eme
rate
ance x T (x)
eval
t al. µ0 µ1

Smooth distributions: µi = i (x)dx

T µ0 = µ1 1 (T (x))|det ⇥T (x)| = 0 (x)

Optimal Transport
Lp optimal transport:
W2 (µ0 , µ1 )p = min ||T (x) x||p µ0 (dx)
T µ0 =µ1

Optimal Transport
W2 (µ0 , µ1 )p = min ||T (x) x||p µ0 (dx)
T µ0 =µ1

Regularity condition:
µ0 or µ1 does not give mass to “small sets”.

Theorem (p > 1): there exists a unique optimal T .

T T
µ1
µ0

Optimal Transport
W2 (µ0 , µ1 )p = min ||T (x) x||p µ0 (dx)
T µ0 =µ1

Regularity condition:
µ0 or µ1 does not give mass to “small sets”.

Theorem (p > 1): there exists a unique optimal T .

Theorem (p = 2): T is deﬁned as T = with convex.

T T T (x)
T (x ) T is monotone:
µ1 x T (x) T (x ), x x 0
µ0 x

From Continuous to Discrete
Vector X R N d µX = Xi
Grayscale: 1-D
(image, coe cients, . . . ) i
Colors: 3-D
B G

R

Grayscale: 1-D
Colors: 3-D
µY = T µX B G
N, T : Xi Y (i)
µY R
µX

Y (i)
Xi

Grayscale: 1-D
Colors: 3-D
µY = T µX B G
N, T : Xi Y (i)
µY R
µX

Y (i)
Xi

Replace optimization on T by optimization on N.

||T (x) x||p µX (dx) = ||Xi Y (i) ||p
i

Continuous Wasserstein Distance
µ

Couplings: µ, x
A Rd , ⇥(A Rd ) = µ(A) y
B Rd , ⇥(Rd B) = (B)

Continuous Wasserstein Distance
µ

Couplings: µ, x
A Rd , ⇥(A Rd ) = µ(A) y
B Rd , ⇥(Rd B) = (B)
Transportation cost:

Wp (µ, )p = min c(x, y)d⇥(x, y)
µ, Rd Rd

Continuous Optimal Transport
Let p > 1 and µ does not vanish on small sets.

Unique µ, s.t. Wp (µ, )p = c(x, y)d⇥(x, y)
Rd Rd

Optimal transport T : Rd Rd :

µ

x
y
(x, T (x))

Continuous Optimal Transport
Let p > 1 and µ does not vanish on small sets.

Unique µ, s.t. Wp (µ, )p = c(x, y)d⇥(x, y)
Rd Rd

Optimal transport T : Rd Rd :

µ

x
p = 2: T = unique solution of y
⇥ is convex l.s.c. (x, T (x))
( ⇥)⇤µ =

1-D Continuous Wasserstein
Distributions µ, on R.
t
Cumulative functions: Cµ (t) = dµ(x)

For all p > 1: T =C 1
Cµ
T is non-decreasing (“change of contrast”)

1-D Continuous Wasserstein
Distributions µ, on R.
t
Cumulative functions: Cµ (t) = dµ(x)

For all p > 1: T =C 1
Cµ
T is non-decreasing (“change of contrast”)

Explicit formulas:
1 H
Wp (µ, )p = |Cµ 1 C 1 p
|
0

W1 (µ, ) = |Cµ C | = ||(Cµ C ) ⇥ H||1
R

Continuous Histogram Transfer
f1
Input images: fi : [0, 1]
2
[0, 1], i = 0, 1.

f0

f1
Input images: fi : [0, 1] 2
[0, 1], i = 0, 1.
Gray-value distributions: µi deﬁned on [0, 1].
µi ([a, b]) = 1{a f b} (x)dx
[0,1]2
µ1

f0

µ0

f1
Input images: fi : [0, 1] 2
[0, 1], i = 0, 1.
Gray-value distributions: µi deﬁned on [0, 1].
µi ([a, b]) = 1{a f b} (x)dx
[0,1]2
Optimal transport: T = Cµ11 Cµ0 . µ1

f0 Cµ0 (f0 ) T (f0 )
Cµ0 Cµ11

µ0 µ1

Discrete Histogram Transfer
Discretized grayscale images f0 , f1 RN .

f1 f0

Discrete distributions µi = µfi = N 1
k fi (k) .

f1 f0

µ1 µ0
10000
5000

8000 4000

6000 3000

4000 2000

2000 1000

0 0

k fi (k) .

Sorting the values : i N s.t. fi ( i (k)) fi ( i (k + 1)).
Optimal transport: T : f0 ( 0 (k)) f1 ( 1 (k))

f1 f0

µ1 µ0
10000
5000

8000 4000

6000 3000

4000 2000

2000 1000

0 0

k fi (k) .


f1 f0 T (f0 )

T

10000

8000

6000
µ1 5000

4000 µ0
10000

8000

6000
µ1
3000
4000
4000 2000
2000
2000 1000

0
0 0 0 50 100 150 200 250

k fi (k) .


Matlab code: [a,I] = sort(f0(:));
f0(I) = sort(f1(:));
f1 f0 T (f0 )

T

10000

8000

6000
µ1 5000

4000 µ0
10000

8000

6000
µ1
3000
4000
4000 2000
2000
2000 1000

0
0 0 0 50 100 150 200 250

Gaussian Optimal Transport
Input distributions (µ0 , µ1 ) with µi = N (mi , i ).

Ellipses: Ei = x Rd (mi x) i
1
(mi x) c

E0 E1

Gaussian Optimal Transport
Input distributions (µ0 , µ1 ) with µi = N (mi , i ).

Ellipses: Ei = x Rd (mi x) i
1
(mi x) c

E0 T E1

Theorem: If ker( 0) Im( 1) = {0},
T (x) = Sx + m1 m0 where
1/2 1/2 1/2 1/2 1/2
S= 1
+
0,1 1 0,1 =( 1 0 1 )

W2 (µ0 , µ1 )2 = tr ( 0+ 1 2 0,1 ) + ||m0 m1 ||2 ,

PDE Formulations

T µ0 = µ1 1 (T (x))|det ⇥T (x)| = 0 (x)

PDE Formulations

T µ0 = µ1 1 (T (x))|det ⇥T (x)| = 0 (x)

L2 optimal transport map T = :
1( ⇥(x))det(H⇥) = 0 (x) (Monge-Amp`re)
e

PDE Formulations

T µ0 = µ1 1 (T (x))|det ⇥T (x)| = 0 (x)

e

Fluid dynamic formulation: ﬁnd (x, t) 0, m(x, t) Rd
1
||m||2 ⇥t + ·m=0
W (µ0 , µ1 )2 = min s.t.
,m Rd 0 (0, ·) = 0, (1, ·) = 1

Finite element discretization [Benamou-Brenier]

PDE Formulations

T µ0 = µ1 1 (T (x))|det ⇥T (x)| = 0 (x)

e

Fluid dynamic formulation: ﬁnd (x, t) 0, m(x, t) Rd
1
||m||2 ⇥t + ·m=0
W (µ0 , µ1 )2 = min s.t.
,m Rd 0 (0, ·) = 0, (1, ·) = 1

Finite element discretization [Benamou-Brenier]

Related works of [Tannenbaum et al.].

cðdvÞ ¼ l0 ðv þ dvÞ detðrðv þ dvÞÞ À l1 ¼ 0:
can be thought as an elliptic system thought as an The sys-system of equations. The sys-
v cc
> tem cv c> can be of equations. elliptic
c the GPU. We usedcubic grid. Relaxation was performed using ainterpolation four-
a trilinear interpolationused a trilinear parallelizable operator for transferring
the GPU. We operator for transferring

Image Registration
s solved using preconditionedsolved that a correction for dv can be obtainedcoarse gridwith an
Ittem is to verify using preconditioned conjugateby solving
is easy conjugate À1 gradient with an gradient correction to finerelaxation scheme. This restriction
color Gauss-Seidel grids. The residual to fine grids. The residual restriction
the coarse grid correction increases robustness
the
the system dv % c> ðcv c> Þ preconditioner. Wright, 1999) The sys-
cðvÞ (Nocedal and
incomplete Cholesky
mplete Cholesky preconditioner. v v
operator for projecting residual fromfor projecting residual from the on
and efficiency and is especially suited for the implementation fine to coarse grids is
operator the fine to coarse grids is
tem cv cc can be thought as an elliptic system of equations. The sys-
>
the GPU. We used a trilinear interpolation operator for transferring
tem is solved using preconditioned conjugate gradient with an the coarse grid correction to fine grids. The residual restriction
incomplete Cholesky preconditioner. operator for projecting residual from the fine to coarse grids is

T

[ur Rehman et al, 2009]
Fig. 6. OMT Results viewed on an axial slice. The top row shows corresponding slices from Pre-op(Left) and Post-op(Right) MRI data. The deformation is clearly visible in the
anterior part of the brain.

Wasserstein Geodesics
Geodesics between (µ0 , µ1 ): t [0, 1] µt
Variational caracterization: (W2 is a geodesic distance)
µt = argmin (1 t)W2 (µ0 , µ)2 + tW2 (µ1 , µ)2
µ

µ µ1
µ0 µ
Optimal transport caracterization:
µt = ((1 t)Id + tT ) µ0

µ µ1
µ0 µ
Optimal transport caracterization:
µt = ((1 t)Id + tT ) µ0
µ0
Gaussian case: µt = Tt µ0 = N (mt , t)

mt = (1 t)m0 + tm1
t = [(1 t)Id + tT ] 0 [(1 t)Id + tT ]

the set of Gaussians is geodesically convex. µ1

Texture Model Interpolation

Exemplar f0


Probability
analysis
distribution
µ = N (m, )

Exemplar f0


Probability
analysis synthesis
distribution
µ = N (m, )

Exemplar f0 Outputs f µ


Probability
analysis synthesis
distribution
µ = N (m, )

Exemplar f0 Outputs f µ

f [0] t f [1]
0 1

Conclusion Source image (X )

Statistical modeling of images.

Source im

Style image (Y )

J. Rabin Wasserstein Re



Applications of OT:
14 Anonymous

– Metric between descriptors. Source im

Use the distance W2 (µ0 , µ1 ) Style image (Y )


P (⇥⇥ ) P (⇥ ) P (⇥ c ) P (⇥⇥ ) in
P (⇥ ) out
P (⇥
) c



Applications of OT:
14 Anonymous




– Projection on statistical constraints.
Use the transport T

P (⇥⇥ ) P (⇥ ) P (⇥ c ) P (⇥⇥ ) in
P (⇥ ) out
P (⇥
) c



Applications of OT:
14 Anonymous




– Projection on statistical constraints.
Use the transport T
– Mixing statistical models.

P (⇥⇥ ) P (⇥ ) P (⇥ c ) P (⇥⇥ ) in
P (⇥ ) out
P (⇥
) c

An Introduction to Optimal Transport

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

More from Gabriel Peyré

More from Gabriel Peyré (20)

An Introduction to Optimal Transport