SlideShare a Scribd company logo
1 of 75
Download to read offline
An Introduction
to Optimal Transport
       Gabriel Peyré

 www.numerical-tours.com
Statistical Image Models
Colors distribution: ) each pixel
         Source image (X                           point in R3




                                        Source image after color transfer

          Style image (Y )

                             J. Rabin   Wasserstein Regularization
Statistical Image Models
Colors distribution: ) each pixel
         Source image (X                                                                                             point in R3


                                 Optimal transport framework Sliced Wasserstein projection Applications

                                  Application to Color Transfer

 Optimal transport framework Sliced Wasserstein projection Applications

  Application to Color Transfer                                                                      Sliced Wasserstein projection color style
                                                                                                      Source image after of X to transfer
                                                                                                     image color statistics Y

                      Style image (Y )

                                                                                J. Rabin              Wasserstein Regularization
                                                  Source image (X )                     Sliced Wasserstein projection of X to style
                                                                                        image color statistics Y




                   Input image
                  Source image (X )                                                                       Modified color transfer
                                                                                                          Source image after
                                                                                                                             image
                                                    Style image (Y )

                                                                                               J. Rabin    Wasserstein Regularization
Statistical Image Models
Colors distribution: ) each pixel
         Source image (X                                                                                             point in R3


                                 Optimal transport framework Sliced Wasserstein projection Applications

                                  Application to Color Transfer

 Optimal transport framework Sliced Wasserstein projection Applications

  Application to Color Transfer                                                                      Sliced Wasserstein projection color style
                                                                                                      Source image after of X to transfer
                                                                                                     image color statistics Y

                      Style image (Y )

                                                                                J. Rabin              Wasserstein Regularization
                                                  Source image (X )                     Sliced Wasserstein projection of X to style
                                                                                        image color statistics Y




                   Input image
                  Source image (X )                                                                       Modified color transfer
                                                                                                          Source image after
                                                                                                                             image
                                                    Style image (Y )
                                                                                Texture synthesis
Other applications:                                                                            J. Rabin    Wasserstein Regularization



                                                                                Texture segmentation
Discrete Distributions
                        N 1
Discrete measure: µ =         pi   Xi   Xi   Rd       pi = 1
                        i=0                       i
Discrete Distributions
                        N 1
Discrete measure: µ =          pi   Xi   Xi   Rd       pi = 1
                        i=0                        i

      Point cloud
Constant weights: pi = 1/N .

              Xi




 Quotient space:
      RN d / N
Discrete Distributions
                        N 1
Discrete measure: µ =          pi   Xi    Xi    Rd         pi = 1
                        i=0                            i

      Point cloud                          Histogram
Constant weights: pi = 1/N .        Fixed positions Xi (e.g. grid)


              Xi




 Quotient space:                      A ne space:
      RN d / N                         {(pi )i  i pi = 1}
From Images to Statistics
                               Source image (X )



Discretized image f   RN   d
                                                   fi              Rd = R3
   N = #pixels, d = #colors.
                                                                    Source image

                               Style image (Y )

                                                        J. Rabin    Wasserstein Regularizatio
From Images to Statistics         Source image (X )



Discretized image f      RN   d
                                                                fi              Rd = R3
    N = #pixels, d = #colors.
                                                                                 Source image


Disclamers: images are not distributions. image (Y )
                                       Style

                                                                     J. Rabin    Wasserstein Regularizatio


     Needs an estimator: f           µf
     Modify f by controlling µf .
From Images to Statistics         Source image (X )



Discretized image f      RN   d
                                                                fi              Rd = R3
    N = #pixels, d = #colors.
                                                                                 Source image


Disclamers: images are not distributions. image (Y )
                                       Style

                                                                     J. Rabin    Wasserstein Regularizatio


     Needs an estimator: f           µf
     Modify f by controlling µf .
                                                                         Xi


Point cloud discretization: µf =            fi
                                        i
From Images to Statistics            Source image (X )



Discretized image f      RN   d
                                                                   fi              Rd = R3
    N = #pixels, d = #colors.
                                                                                    Source image


Disclamers: images are not distributions. image (Y )
                                       Style

                                                                        J. Rabin    Wasserstein Regularizatio


     Needs an estimator: f           µf
     Modify f by controlling µf .
                                                                            Xi


Point cloud discretization: µf =               fi
                                         i


Histogram discretization:         µf =       pi     Xi
                                         i
                         1
   Parzen windows: pi =                  (xi        fj )
                        Zf          j
Overview


• Discrete Optimal Transport



• Continuous Optimal Transport



• Displacement Interpolation
Optimal Assignments
                                     N
                                 1
Discrete distributions:   µX   =           Xi
                                 N   i=1        µY
                                      µX
Optimal Assignments
                                            N
                                     1
Discrete distributions:      µX    =              Xi
                                     N      i=1        µY
                                             µX
Optimal assignment:
   ⇥
       ⇥ argmin       ||Xi   Y    (i) ||p               Y   (i)
            N     i                               Xi
Optimal Assignments
                                               N
                                        1
Discrete distributions:         µX    =              Xi
                                        N      i=1                   µY
                                                µX
Optimal assignment:
    ⇥
        ⇥ argmin       ||Xi     Y    (i) ||p                          Y   (i)
              N    i                                 Xi

Wasserstein distance:         Wp (µX , µY )p =            ||Xi   Y    (i) ||p
                                                      i
         Metric on the space of distributions.
Optimal Assignments
                                                 N
                                          1
Discrete distributions:           µX    =              Xi
                                          N      i=1                   µY
                                                  µX
Optimal assignment:
    ⇥
        ⇥ argmin         ||Xi     Y    (i) ||p                          Y   (i)
              N    i                                   Xi

Wasserstein distance:           Wp (µX , µY )p =            ||Xi   Y    (i) ||p
                                                        i
         Metric on the space of distributions.


Projection on statistical constraints:                 C = {f  µf = µY }
        ProjC (f ) = Y
Computing Transport Distances

Explicit solution for 1D distribution (e.g. grayscale images):
                                                      Xi
                                                      Yi
       sorting the values, O(N log(N )) operations.
Computing Transport Distances

Explicit solution for 1D distribution (e.g. grayscale images):
                                                      Xi
                                                      Yi
       sorting the values, O(N log(N )) operations.

Higher dimensions: combinatorial optimization methods
   Hungarian algorithm, auctions algorithm, etc.
       O(N 5/2 log(N )) operations.
        intractable for imaging problems.
Computing Transport Distances

Explicit solution for 1D distribution (e.g. grayscale images):
                                                          Xi
                                                          Yi
        sorting the values, O(N log(N )) operations.

Higher dimensions: combinatorial optimization methods
   Hungarian algorithm, auctions algorithm, etc.
        O(N 5/2 log(N )) operations.
        intractable for imaging problems.

Arbitrary distributions:      µ=           pi   Xi   ⇥=       qi   Yi
                                       i                  i
        Wp (µ, ) solution of a linear program.
                 p
Coupling Matrices
                                              N1
              weighted distributions     µ=   i=1 pi Xi
Extension to:
              arbitrary number of points  =
                                              N2
                                              i=1 qi Yi
Coupling Matrices
                                                                   N1
              weighted distributions     µ=                        i=1 pi Xi
Extension to:
              arbitrary number of points  =
                                                                   N2
                                                                   i=1 qi Yi

If N1 = N2 , permutation matrix: P = P = (                     i   (j) )i,j

             ||Xi   Y   (i) ||p =         Pi,j ||Xi   Yj ||p
         i                          i,j
Coupling Matrices
                                                                        N1
              weighted distributions     µ=                             i=1 pi Xi
Extension to:
              arbitrary number of points  =
                                                                        N2
                                                                        i=1 qi Yi

If N1 = N2 , permutation matrix: P = P = (                          i   (j) )i,j

                ||Xi    Y    (i) ||p =         Pi,j ||Xi   Yj ||p
            i                            i,j
Probabilistic coupling:
      µ,   = P         RN1    N2
                                   P          0, P 1 = p, P 1 = q
                                                                                   p




                                                            q
Coupling Matrices
                                                                        N1
              weighted distributions     µ=                             i=1 pi Xi
Extension to:
              arbitrary number of points  =
                                                                        N2
                                                                        i=1 qi Yi

If N1 = N2 , permutation matrix: P = P = (                          i   (j) )i,j

                ||Xi    Y    (i) ||p =         Pi,j ||Xi   Yj ||p
            i                            i,j
Probabilistic coupling:
      µ,   = P         RN1    N2
                                   P          0, P 1 = p, P 1 = q
                                                                                   p
       Defined even if N1 = N2 .
       Takes into account weights.
       Linear objective.

                                                            q
Kantorovitch Formulation
Linear programming (Kantorovitch):      W (µ, )p = P , C
     P     argmin P, C =            Ci,j Pi,j
           P    µ,
                                                               p
                              i,j




                                                  q

                                                Optimal coupling P :




                                                 p             q
Kantorovitch Formulation
Linear programming (Kantorovitch):        W (µ, )p = P , C
     P     argmin P, C =              Ci,j Pi,j
           P       µ,
                                                                 p
                                i,j



     P

                    extremal points:                q
          P,
 P             C
                   =c
                        st                        Optimal coupling P :




                                                   p             q
Kantorovitch Formulation
Linear programming (Kantorovitch):        W (µ, )p = P , C
     P     argmin P, C =              Ci,j Pi,j
           P       µ,
                                                                 p
                                i,j



     P

                    extremal points:                q
          P,
 P             C
                   =c
                        st                        Optimal coupling P :

     Theorem: If pi = qi = 1/N ,
              N, P = P
                                                   p             q
Our experiments also show that fixed-point precision further s
           Optimization Codes
                         up the computation. We observed that the value of the final
                         port cost is less accurate because of the limited precision, b
                         the particle pairing that produces the actual interpolation s
Discrete   optimal transport: unchanged. We used the fixed point method to ge
                         remains
                         the results presented in this paper. The results of the perfor
     P      argmin P, Cstudy are also of broader interest, as current EMD image re
                            =          Ci,j Pi,j
             P           or color transfer techniques rely on slower solvers [Rubner
                µ,               i,j
                         2000; Kanters et al. 2003; Morovic and Sun 2003].

                                                          2


Linear program:
                                                         10        Network-Simplex-(fixed-point)
                                                                   Network-Simplex-(double-prec.)
                                                                   Transport-Simplex
                                                                   y-=-α-x2




                                       Time-in-seconds
  Interior points: slow.
                                                                   y-=-β-x3
                                                          0
                                                         10




  Network simplex.                                       10


  Transportation simplex.
                                                         10                         [Bonneel et al. 2011]
                                                               1                2                    3                 4
                                                              10              10                    10                10
                                                                        Problem-size-:-number-of-bins-per-histogram


                                Figure 6: Log-log plot of the running times of different so
                                The network simplex behaves as a O(n2 ) algorithm in pr
Block search pivoting        strategythe transportand O’Neill 1991]
                                whereas [Kelly simplex runs in O(n3 ).
pplication to Color Transfer
                Color Histogram Equalization
                                                      1
    Input color images: fi RN 3 . projectioniof= to style
                         Sliced Wasserstein
                                                  ⇥ X
                                                      N x
                                                                                                           fi (x)
                         image color statistics Y



                           Optimal transport framework Sliced Wasserstein projection Applications

                            Application to Color Transfer
      Source image (X )

     f1                                   f0                                                   Sliced Wasserstein projec
                                                                                               image color statistics Y

     f0

                                             Source image after color transfer
      µ1 image (Y )
       Style                                Source image (X )
                                            µ0
                               J. Rabin       Wasserstein Regularization
pplication to Color Transfer
                Color Histogram Equalization
                                                     1
   Input color images: fi RN 3 . projectioniof= to style
                        Sliced Wasserstein
                                                 ⇥ X
                                                     N x
                                                                                                           fi (x)
                        image color statistics Y
   Optimal assignement:        min ||f0 f1 ⇥ ||
                                                          N

                           Optimal transport framework Sliced Wasserstein projection Applications

                            Application to Color Transfer
      Source image (X )

     f1                                   f0                                                   Sliced Wasserstein projec
                                                                                               image color statistics Y

     f0

                                             Source image after color transfer
      µ1 image (Y )
       Style                                Source image (X )
                                            µ0
                               J. Rabin       Wasserstein Regularization
pplication to Color Transfer
                Color Histogram Equalization
                                                     1
   Input color images: fi RN 3 . projectioniof= to style
                        Sliced Wasserstein
                                                 ⇥ X
                                                     N x
                                                                                                              fi (x)
                        image color statistics Y
   Optimal assignement:        min ||f0 f1 ⇥ ||
                                                             N

   Transport:             T : f0 (x)              R3              f1 ( (i))                   R3
                              Optimal transport framework Sliced Wasserstein projection Applications

                               Application to Color Transfer
      Source image (X )

     f1                                      f0                                                   Sliced Wasserstein projec
                                                                                                  image color statistics Y

     f0

                                                Source image after color transfer
      µ1 image (Y )
       Style                                   Source image (X )
                                               µ0
                                T
                                  J. Rabin       Wasserstein Regularization
pplication to Color Transfer
                  Color Histogram Equalization
                                                     1
   Input color images: fi RN 3 . projectioniof= to style
                        Sliced Wasserstein
                                                 ⇥ X
                                                     N x
                                                                                                                                                                fi (x)
                        image color statistics Y
   Optimal assignement:        min ||f0 f1 ⇥ ||
                                                                    N
                                                          Optimal transport framework Sliced Wasserstein projection Applications



   Transport:        T : f0 (x) R3Application to Color Transfer R3
                                           f1 ( (i))
                                 Optimal transport framework Sliced Wasserstein projection Applications

                          Application to Color Transfer
                       ˜
    Equalization: ) f0 = T (f0 )              ˜
                                             f0 = f1             Sliced Wasserstein projection of X to sty
     Source image (X                                             image color statistics Y


     f1                                         f0                                                                                  T (f )
                                                                                                                                      0
                                                                                                                              Sliced Wasserstein projec
                                                                                                                              image color statistics Y
                                                                           Source image (X )
                                                                                                               T
     f0

                                                   Source image after color transfer
       µ1 image (Y )
        Style                                     Source image (X )
                                                  µ0                                 Source image after color transfer
                                                                                       µ1
                                                                             Style image (Y )

                                   T                                                                                    J. Rabin   Wasserstein Regularization
                                     J. Rabin       Wasserstein Regularization
Overview


• Discrete Optimal Transport



• Continuous Optimal Transport



• Displacement Interpolation
ork,         Measure Preserving Maps
 ica-
d ofDistributions µ0 , µ1 on Rk .
 ase.
eeds
ans-
 that
eme
 rate
ance
eval
 t al.         µ0                 µ1
ork,         Measure Preserving Maps
 ica-
d ofDistributions µ0 , µ1 on Rk .
 ase.
eeds
    Mass preserving map T : Rk    Rk .
ans-
 that µ1 = T µ0 where (T µ0 )(A) = µ0 (T           (A))
                                               1

eme
 rate
ance            x                      T (x)
eval
 t al.         µ0                         µ1
ork,         Measure Preserving Maps
 ica-
d ofDistributions µ0 , µ1 on Rk .
 ase.
eeds
    Mass preserving map T : Rk    Rk .
ans-
 that µ1 = T µ0 where (T µ0 )(A) = µ0 (T              (A))
                                                  1

eme
 rate
ance            x                      T (x)
eval
 t al.         µ0                         µ1


  Smooth distributions:   µi =   i (x)dx

        T µ0 = µ1             1 (T (x))|det   ⇥T (x)| =      0 (x)
Optimal Transport
Lp optimal transport:
        W2 (µ0 , µ1 )p = min       ||T (x)   x||p µ0 (dx)
                        T µ0 =µ1
Optimal Transport
Lp optimal transport:
           W2 (µ0 , µ1 )p = min     ||T (x)   x||p µ0 (dx)
                         T µ0 =µ1

Regularity condition:
     µ0 or µ1 does not give mass to “small sets”.

Theorem (p > 1): there exists a unique optimal T .




       T     T
  µ1
           µ0
Optimal Transport
Lp optimal transport:
           W2 (µ0 , µ1 )p = min       ||T (x)    x||p µ0 (dx)
                         T µ0 =µ1

Regularity condition:
     µ0 or µ1 does not give mass to “small sets”.

Theorem (p > 1): there exists a unique optimal T .


Theorem (p = 2): T is defined as T =               with     convex.

       T     T          T (x)
                                  T (x )        T is monotone:
  µ1                   x                   T (x) T (x ), x x         0
           µ0          x
From Continuous to Discrete
Vector X R      N   d         µX =       Xi
                                              Grayscale: 1-D
(image, coe cients, . . . )          i
                                                  Colors: 3-D
                                              B      G

                                                           R
From Continuous to Discrete
Vector X R      N   d             µX =        Xi
                                                   Grayscale: 1-D
(image, coe cients, . . . )               i
                                                       Colors: 3-D
            µY = T µX                              B      G
               N,   T : Xi  Y       (i)
                         µY                                     R
         µX

                              Y   (i)
              Xi
From Continuous to Discrete
Vector X R         N    d           µX =               Xi
                                                                Grayscale: 1-D
(image, coe cients, . . . )                      i
                                                                      Colors: 3-D
            µY = T µX                                           B        G
               N,      T : Xi  Y      (i)
                            µY                                                    R
         µX

                                Y   (i)
              Xi

Replace optimization on T by optimization on                                 N.

         ||T (x)       x||p µX (dx) =           ||Xi    Y   (i) ||p
                                            i
Continuous Wasserstein Distance
                                               µ

Couplings:          µ,                     x
     A       Rd , ⇥(A Rd ) = µ(A)   y
     B       Rd , ⇥(Rd B) = (B)
Continuous Wasserstein Distance
                                                            µ

Couplings:          µ,                                  x
     A       Rd , ⇥(A Rd ) = µ(A)                 y
     B       Rd , ⇥(Rd B) = (B)
Transportation cost:



      Wp (µ, )p = min                 c(x, y)d⇥(x, y)
                         µ,   Rd Rd
Continuous Wasserstein Distance
                                                            µ

Couplings:          µ,                                  x
     A       Rd , ⇥(A Rd ) = µ(A)                 y
     B       Rd , ⇥(Rd B) = (B)
Transportation cost:



      Wp (µ, )p = min                 c(x, y)d⇥(x, y)
                         µ,   Rd Rd
Continuous Optimal Transport
Let p > 1 and µ does not vanish on small sets.

 Unique       µ,   s.t. Wp (µ, )p =           c(x, y)d⇥(x, y)
                                      Rd Rd

Optimal transport T : Rd   Rd :


                                                            µ

                                                      x
                                              y
                                                   (x, T (x))
Continuous Optimal Transport
Let p > 1 and µ does not vanish on small sets.

 Unique        µ,   s.t. Wp (µ, )p =           c(x, y)d⇥(x, y)
                                       Rd Rd

Optimal transport T : Rd    Rd :


                                                             µ

                                                       x
p = 2: T =      unique solution of             y
       ⇥ is convex l.s.c.                           (x, T (x))
       ( ⇥)⇤µ =
1-D Continuous Wasserstein
Distributions µ,    on R.
                                           t
Cumulative functions:           Cµ (t) =       dµ(x)

For all p > 1:     T =C     1
                                 Cµ
    T is non-decreasing (“change of contrast”)
1-D Continuous Wasserstein
Distributions µ,         on R.
                                                    t
Cumulative functions:                    Cµ (t) =       dµ(x)

For all p > 1:     T =C              1
                                          Cµ
    T is non-decreasing (“change of contrast”)

Explicit formulas:
                         1                                       H
   Wp (µ, )p =               |Cµ 1       C   1 p
                                               |
                     0

  W1 (µ, ) =         |Cµ         C | = ||(Cµ        C ) ⇥ H||1
                 R
Continuous Histogram Transfer
                                                f1
Input images: fi : [0, 1]
                        2
                            [0, 1], i = 0, 1.




f0
Continuous Histogram Transfer
                                                         f1
Input images: fi : [0, 1]      2
                                     [0, 1], i = 0, 1.
Gray-value distributions: µi defined on [0, 1].
      µi ([a, b]) =            1{a   f b} (x)dx
                      [0,1]2
                                                              µ1

f0




            µ0
Continuous Histogram Transfer
                                                           f1
Input images: fi : [0, 1]      2
                                     [0, 1], i = 0, 1.
Gray-value distributions: µi defined on [0, 1].
      µi ([a, b]) =            1{a   f b} (x)dx
                      [0,1]2
Optimal transport: T = Cµ11 Cµ0 .                                  µ1

f0                         Cµ0 (f0 )                            T (f0 )
                  Cµ0                               Cµ11



            µ0                                                     µ1
Discrete Histogram Transfer
Discretized grayscale images f0 , f1   RN .




f1                f0
Discrete Histogram Transfer
        Discretized grayscale images f0 , f1   RN .
        Discrete distributions µi = µfi = N    1
                                                   k fi (k) .




        f1                      f0




                    µ1               µ0
10000
                         5000

 8000                    4000

 6000                    3000

 4000                    2000

 2000                    1000

    0                      0
Discrete Histogram Transfer
        Discretized grayscale images f0 , f1          RN .
        Discrete distributions µi = µfi = N           1
                                                            k fi (k) .

        Sorting the values :         i   N   s.t. fi ( i (k))    fi ( i (k + 1)).
        Optimal transport: T : f0 (      0 (k))      f1 (   1 (k))




        f1                      f0




                    µ1                       µ0
10000
                         5000

 8000                    4000

 6000                    3000

 4000                    2000

 2000                    1000

    0                      0
Discrete Histogram Transfer
        Discretized grayscale images f0 , f1          RN .
        Discrete distributions µi = µfi = N           1
                                                            k fi (k) .

        Sorting the values :         i   N   s.t. fi ( i (k))               fi ( i (k + 1)).
        Optimal transport: T : f0 (      0 (k))      f1 (   1 (k))




        f1                      f0                                                        T (f0 )

                                                       T


10000

 8000

 6000
                    µ1   5000

                         4000                µ0
                                                                10000

                                                                 8000

                                                                 6000
                                                                                                    µ1
                         3000
                                                                 4000
 4000                    2000
                                                                 2000
 2000                    1000

                                                                    0
    0                      0                                            0    50   100   150   200    250
Discrete Histogram Transfer
        Discretized grayscale images f0 , f1          RN .
        Discrete distributions µi = µfi = N           1
                                                            k fi (k) .

        Sorting the values :         i   N   s.t. fi ( i (k))               fi ( i (k + 1)).
        Optimal transport: T : f0 (      0 (k))      f1 (   1 (k))

        Matlab code:             [a,I] = sort(f0(:));
                                 f0(I) = sort(f1(:));
        f1                      f0                                                        T (f0 )

                                                       T


10000

 8000

 6000
                    µ1   5000

                         4000                µ0
                                                                10000

                                                                 8000

                                                                 6000
                                                                                                    µ1
                         3000
                                                                 4000
 4000                    2000
                                                                 2000
 2000                    1000

                                                                    0
    0                      0                                            0    50   100   150   200    250
Gaussian Optimal Transport
Input distributions (µ0 , µ1 ) with µi = N (mi ,     i ).

Ellipses: Ei = x     Rd  (mi    x)   i
                                          1
                                              (mi   x)      c

            E0                                 E1
Gaussian Optimal Transport
Input distributions (µ0 , µ1 ) with µi = N (mi ,                     i ).

Ellipses: Ei = x       Rd  (mi         x)        i
                                                      1
                                                          (mi       x)       c

            E0                T                            E1


      Theorem: If ker(        0)   Im(       1)   = {0},
             T (x) = Sx + m1            m0            where
                 1/2         1/2                          1/2       1/2 1/2
        S=       1
                       +
                       0,1   1      0,1      =(           1     0   1 )



     W2 (µ0 , µ1 )2 = tr (    0+    1        2    0,1 ) + ||m0              m1 ||2 ,
PDE Formulations
Smooth distributions:   µi =   i (x)dx

      T µ0 = µ1            1 (T (x))|det   ⇥T (x)| =   0 (x)
PDE Formulations
Smooth distributions:   µi =       i (x)dx

      T µ0 = µ1              1 (T (x))|det    ⇥T (x)| =   0 (x)

L2 optimal transport map T =          :
     1(   ⇥(x))det(H⇥) =   0 (x)             (Monge-Amp`re)
                                                       e
PDE Formulations
Smooth distributions:              µi =      i (x)dx

        T µ0 = µ1                      1 (T (x))|det     ⇥T (x)| =     0 (x)

L2 optimal transport map T =                    :
      1(   ⇥(x))det(H⇥) =            0 (x)              (Monge-Amp`re)
                                                                  e

Fluid dynamic formulation:              find (x, t)           0, m(x, t)     Rd
                               1
                                   ||m||2           ⇥t +          ·m=0
W (µ0 , µ1 )2 = min                          s.t.
                ,m    Rd   0                           (0, ·) =   0,   (1, ·) =   1

    Finite element discretization [Benamou-Brenier]
PDE Formulations
Smooth distributions:              µi =      i (x)dx

        T µ0 = µ1                      1 (T (x))|det     ⇥T (x)| =     0 (x)

L2 optimal transport map T =                    :
      1(   ⇥(x))det(H⇥) =            0 (x)              (Monge-Amp`re)
                                                                  e

Fluid dynamic formulation:              find (x, t)           0, m(x, t)     Rd
                               1
                                   ||m||2           ⇥t +          ·m=0
W (µ0 , µ1 )2 = min                          s.t.
                ,m    Rd   0                           (0, ·) =   0,   (1, ·) =   1

    Finite element discretization [Benamou-Brenier]

 Related works of [Tannenbaum et al.].
cðdvÞ ¼ l0 ðv þ dvÞ detðrðv þ dvÞÞ À l1 ¼ 0:
    can be thought as an elliptic system thought as an The sys-system of equations. The sys-
 v cc
    >                   tem cv c> can be of equations. elliptic
                                 c                                      the GPU. We usedcubic grid. Relaxation was performed using ainterpolation four-
                                                                                          a trilinear interpolationused a trilinear parallelizable operator for transferring
                                                                                                      the GPU. We operator for transferring

                                   Image Registration
 s solved using preconditionedsolved that a correction for dv can be obtainedcoarse gridwith an
                      Ittem is to verify using preconditioned conjugateby solving
                         is easy   conjugate À1  gradient with an              gradient correction to finerelaxation scheme. This restriction
                                                                                         color Gauss-Seidel grids. The residual to fine grids. The residual restriction
                                                                                                      the coarse grid correction increases robustness
                                                                        the
                      the system dv % c> ðcv c> Þ preconditioner. Wright, 1999) The sys-
                                                  cðvÞ (Nocedal and
                        incomplete Cholesky
mplete Cholesky preconditioner.         v     v
                                                                        operator for projecting residual fromfor projecting residual from the on
                                                                                         and efficiency and is especially suited for the implementation fine to coarse grids is
                                                                                                      operator the fine to coarse grids is
                         tem cv cc can be thought as an elliptic system of equations. The sys-
                               >
                                                                                                                  the GPU. We used a trilinear interpolation operator for transferring
                         tem is solved using preconditioned conjugate gradient with an                            the coarse grid correction to fine grids. The residual restriction
                         incomplete Cholesky preconditioner.                                                      operator for projecting residual from the fine to coarse grids is




                                                                                                      T




                                                                  [ur Rehman et al, 2009]
                         Fig. 6. OMT Results viewed on an axial slice. The top row shows corresponding slices from Pre-op(Left) and Post-op(Right) MRI data. The deformation is clearly visible in the
                         anterior part of the brain.
Overview


• Discrete Optimal Transport



• Continuous Optimal Transport



• Displacement Interpolation
Wasserstein Geodesics
Geodesics between (µ0 , µ1 ): t   [0, 1]   µt
Variational caracterization:  (W2 is a geodesic distance)
     µt = argmin (1    t)W2 (µ0 , µ)2 + tW2 (µ1 , µ)2
             µ
Wasserstein Geodesics
Geodesics between (µ0 , µ1 ): t     [0, 1]   µt
Variational caracterization: (W2 is a geodesic distance)
    µt = argmin (1 t)W2 (µ0 , µ)2 + tW2 (µ1 , µ)2
            µ                                          µ1
                                       µ0     µ
Optimal transport caracterization:
        µt = ((1   t)Id + tT ) µ0
Wasserstein Geodesics
Geodesics between (µ0 , µ1 ): t        [0, 1]    µt
Variational caracterization: (W2 is a geodesic distance)
    µt = argmin (1 t)W2 (µ0 , µ)2 + tW2 (µ1 , µ)2
            µ                                          µ1
                                       µ0     µ
Optimal transport caracterization:
         µt = ((1    t)Id + tT ) µ0
                                                      µ0
Gaussian case: µt = Tt µ0 = N (mt ,             t)

   mt = (1       t)m0 + tm1
     t   = [(1   t)Id + tT ]   0 [(1   t)Id + tT ]

    the set of Gaussians is geodesically convex.           µ1
Texture Model Interpolation




Exemplar f0
Texture Model Interpolation

                    Probability
          analysis
                    distribution
                   µ = N (m, )

Exemplar f0
Texture Model Interpolation

                    Probability
          analysis               synthesis
                    distribution
                   µ = N (m, )

Exemplar f0                                  Outputs f   µ
Texture Model Interpolation

                     Probability
           analysis               synthesis
                     distribution
                    µ = N (m, )

Exemplar f0                                   Outputs f       µ




  f [0]                                           t   f [1]
              0                               1
Conclusion               Source image (X )


Statistical modeling of images.


                                                                 Source im

                                  Style image (Y )

                                                      J. Rabin   Wasserstein Re
Conclusion                           Source image (X )


Statistical modeling of images.

Applications of OT:
                  14   Anonymous

 – Metric between descriptors.                                                          Source im


      Use the distance W2 (µ0 , µ1 )           Style image (Y )

                                                                          J. Rabin      Wasserstein Re




                        P (⇥⇥ )    P (⇥ )   P (⇥   c   )   P (⇥⇥ )    in
                                                                     P (⇥ )           out
                                                                                     P (⇥
                                                                                        )   c
Conclusion                           Source image (X )


Statistical modeling of images.

Applications of OT:
                  14   Anonymous

 – Metric between descriptors.                                                          Source im


      Use the distance W2 (µ0 , µ1 )           Style image (Y )

                                                                          J. Rabin      Wasserstein Re


– Projection on statistical constraints.
       Use the transport T




                        P (⇥⇥ )    P (⇥ )   P (⇥   c   )   P (⇥⇥ )    in
                                                                     P (⇥ )           out
                                                                                     P (⇥
                                                                                        )   c
Conclusion                           Source image (X )


Statistical modeling of images.

Applications of OT:
                  14   Anonymous

 – Metric between descriptors.                                                          Source im


      Use the distance W2 (µ0 , µ1 )           Style image (Y )

                                                                          J. Rabin      Wasserstein Re


– Projection on statistical constraints.
       Use the transport T
 – Mixing statistical models.




                        P (⇥⇥ )    P (⇥ )   P (⇥   c   )   P (⇥⇥ )    in
                                                                     P (⇥ )           out
                                                                                     P (⇥
                                                                                        )   c

More Related Content

What's hot

Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)nikhilus85
 
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionEntrepreneur / Startup
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Appsilon Data Science
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...Edge AI and Vision Alliance
 
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)Universitat Politècnica de Catalunya
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks
 
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Robust Low-rank and Sparse Decomposition for Moving Object Detection
Robust Low-rank and Sparse Decomposition for Moving Object DetectionRobust Low-rank and Sparse Decomposition for Moving Object Detection
Robust Low-rank and Sparse Decomposition for Moving Object DetectionActiveEon
 
PRML 1.5-1.5.5 決定理論
PRML 1.5-1.5.5 決定理論PRML 1.5-1.5.5 決定理論
PRML 1.5-1.5.5 決定理論Akihiro Nitta
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationVikas Jain
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveySangwoo Mo
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...Bang Xiang Yong
 
Weakly supervised semantic segmentation of 3D point cloud
Weakly supervised semantic segmentation of 3D point cloudWeakly supervised semantic segmentation of 3D point cloud
Weakly supervised semantic segmentation of 3D point cloudArithmer Inc.
 
Object detection and Instance Segmentation
Object detection and Instance SegmentationObject detection and Instance Segmentation
Object detection and Instance SegmentationHichem Felouat
 
3D Perception for Autonomous Driving - Datasets and Algorithms -
3D Perception for Autonomous Driving - Datasets and Algorithms -3D Perception for Autonomous Driving - Datasets and Algorithms -
3D Perception for Autonomous Driving - Datasets and Algorithms -Kazuyuki Miyazawa
 

What's hot (20)

Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)
 
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detection
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
 
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
 
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
 
Learning to Personalize
Learning to PersonalizeLearning to Personalize
Learning to Personalize
 
Robust Low-rank and Sparse Decomposition for Moving Object Detection
Robust Low-rank and Sparse Decomposition for Moving Object DetectionRobust Low-rank and Sparse Decomposition for Moving Object Detection
Robust Low-rank and Sparse Decomposition for Moving Object Detection
 
PRML 1.5-1.5.5 決定理論
PRML 1.5-1.5.5 決定理論PRML 1.5-1.5.5 決定理論
PRML 1.5-1.5.5 決定理論
 
Hog
HogHog
Hog
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and Classification
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation Survey
 
PointNet
PointNetPointNet
PointNet
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...
 
Weakly supervised semantic segmentation of 3D point cloud
Weakly supervised semantic segmentation of 3D point cloudWeakly supervised semantic segmentation of 3D point cloud
Weakly supervised semantic segmentation of 3D point cloud
 
Object detection and Instance Segmentation
Object detection and Instance SegmentationObject detection and Instance Segmentation
Object detection and Instance Segmentation
 
3D Perception for Autonomous Driving - Datasets and Algorithms -
3D Perception for Autonomous Driving - Datasets and Algorithms -3D Perception for Autonomous Driving - Datasets and Algorithms -
3D Perception for Autonomous Driving - Datasets and Algorithms -
 

More from Gabriel Peyré

Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Gabriel Peyré
 
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Gabriel Peyré
 
Low Complexity Regularization of Inverse Problems - Course #1 Inverse Problems
Low Complexity Regularization of Inverse Problems - Course #1 Inverse ProblemsLow Complexity Regularization of Inverse Problems - Course #1 Inverse Problems
Low Complexity Regularization of Inverse Problems - Course #1 Inverse ProblemsGabriel Peyré
 
Low Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse ProblemsLow Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse ProblemsGabriel Peyré
 
Model Selection with Piecewise Regular Gauges
Model Selection with Piecewise Regular GaugesModel Selection with Piecewise Regular Gauges
Model Selection with Piecewise Regular GaugesGabriel Peyré
 
Signal Processing Course : Inverse Problems Regularization
Signal Processing Course : Inverse Problems RegularizationSignal Processing Course : Inverse Problems Regularization
Signal Processing Course : Inverse Problems RegularizationGabriel Peyré
 
Proximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportProximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportGabriel Peyré
 
Geodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and GraphicsGeodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and GraphicsGabriel Peyré
 
Learning Sparse Representation
Learning Sparse RepresentationLearning Sparse Representation
Learning Sparse RepresentationGabriel Peyré
 
Adaptive Signal and Image Processing
Adaptive Signal and Image ProcessingAdaptive Signal and Image Processing
Adaptive Signal and Image ProcessingGabriel Peyré
 
Mesh Processing Course : Mesh Parameterization
Mesh Processing Course : Mesh ParameterizationMesh Processing Course : Mesh Parameterization
Mesh Processing Course : Mesh ParameterizationGabriel Peyré
 
Mesh Processing Course : Multiresolution
Mesh Processing Course : MultiresolutionMesh Processing Course : Multiresolution
Mesh Processing Course : MultiresolutionGabriel Peyré
 
Mesh Processing Course : Introduction
Mesh Processing Course : IntroductionMesh Processing Course : Introduction
Mesh Processing Course : IntroductionGabriel Peyré
 
Mesh Processing Course : Geodesics
Mesh Processing Course : GeodesicsMesh Processing Course : Geodesics
Mesh Processing Course : GeodesicsGabriel Peyré
 
Mesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic SamplingMesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic SamplingGabriel Peyré
 
Mesh Processing Course : Differential Calculus
Mesh Processing Course : Differential CalculusMesh Processing Course : Differential Calculus
Mesh Processing Course : Differential CalculusGabriel Peyré
 
Mesh Processing Course : Active Contours
Mesh Processing Course : Active ContoursMesh Processing Course : Active Contours
Mesh Processing Course : Active ContoursGabriel Peyré
 
Signal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Theory for Sparse RecoverySignal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Theory for Sparse RecoveryGabriel Peyré
 
Signal Processing Course : Presentation of the Course
Signal Processing Course : Presentation of the CourseSignal Processing Course : Presentation of the Course
Signal Processing Course : Presentation of the CourseGabriel Peyré
 
Signal Processing Course : Orthogonal Bases
Signal Processing Course : Orthogonal BasesSignal Processing Course : Orthogonal Bases
Signal Processing Course : Orthogonal BasesGabriel Peyré
 

More from Gabriel Peyré (20)

Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
 
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
 
Low Complexity Regularization of Inverse Problems - Course #1 Inverse Problems
Low Complexity Regularization of Inverse Problems - Course #1 Inverse ProblemsLow Complexity Regularization of Inverse Problems - Course #1 Inverse Problems
Low Complexity Regularization of Inverse Problems - Course #1 Inverse Problems
 
Low Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse ProblemsLow Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse Problems
 
Model Selection with Piecewise Regular Gauges
Model Selection with Piecewise Regular GaugesModel Selection with Piecewise Regular Gauges
Model Selection with Piecewise Regular Gauges
 
Signal Processing Course : Inverse Problems Regularization
Signal Processing Course : Inverse Problems RegularizationSignal Processing Course : Inverse Problems Regularization
Signal Processing Course : Inverse Problems Regularization
 
Proximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportProximal Splitting and Optimal Transport
Proximal Splitting and Optimal Transport
 
Geodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and GraphicsGeodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and Graphics
 
Learning Sparse Representation
Learning Sparse RepresentationLearning Sparse Representation
Learning Sparse Representation
 
Adaptive Signal and Image Processing
Adaptive Signal and Image ProcessingAdaptive Signal and Image Processing
Adaptive Signal and Image Processing
 
Mesh Processing Course : Mesh Parameterization
Mesh Processing Course : Mesh ParameterizationMesh Processing Course : Mesh Parameterization
Mesh Processing Course : Mesh Parameterization
 
Mesh Processing Course : Multiresolution
Mesh Processing Course : MultiresolutionMesh Processing Course : Multiresolution
Mesh Processing Course : Multiresolution
 
Mesh Processing Course : Introduction
Mesh Processing Course : IntroductionMesh Processing Course : Introduction
Mesh Processing Course : Introduction
 
Mesh Processing Course : Geodesics
Mesh Processing Course : GeodesicsMesh Processing Course : Geodesics
Mesh Processing Course : Geodesics
 
Mesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic SamplingMesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic Sampling
 
Mesh Processing Course : Differential Calculus
Mesh Processing Course : Differential CalculusMesh Processing Course : Differential Calculus
Mesh Processing Course : Differential Calculus
 
Mesh Processing Course : Active Contours
Mesh Processing Course : Active ContoursMesh Processing Course : Active Contours
Mesh Processing Course : Active Contours
 
Signal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Theory for Sparse RecoverySignal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Theory for Sparse Recovery
 
Signal Processing Course : Presentation of the Course
Signal Processing Course : Presentation of the CourseSignal Processing Course : Presentation of the Course
Signal Processing Course : Presentation of the Course
 
Signal Processing Course : Orthogonal Bases
Signal Processing Course : Orthogonal BasesSignal Processing Course : Orthogonal Bases
Signal Processing Course : Orthogonal Bases
 

An Introduction to Optimal Transport

  • 1. An Introduction to Optimal Transport Gabriel Peyré www.numerical-tours.com
  • 2. Statistical Image Models Colors distribution: ) each pixel Source image (X point in R3 Source image after color transfer Style image (Y ) J. Rabin Wasserstein Regularization
  • 3. Statistical Image Models Colors distribution: ) each pixel Source image (X point in R3 Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Sliced Wasserstein projection color style Source image after of X to transfer image color statistics Y Style image (Y ) J. Rabin Wasserstein Regularization Source image (X ) Sliced Wasserstein projection of X to style image color statistics Y Input image Source image (X ) Modified color transfer Source image after image Style image (Y ) J. Rabin Wasserstein Regularization
  • 4. Statistical Image Models Colors distribution: ) each pixel Source image (X point in R3 Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Sliced Wasserstein projection color style Source image after of X to transfer image color statistics Y Style image (Y ) J. Rabin Wasserstein Regularization Source image (X ) Sliced Wasserstein projection of X to style image color statistics Y Input image Source image (X ) Modified color transfer Source image after image Style image (Y ) Texture synthesis Other applications: J. Rabin Wasserstein Regularization Texture segmentation
  • 5. Discrete Distributions N 1 Discrete measure: µ = pi Xi Xi Rd pi = 1 i=0 i
  • 6. Discrete Distributions N 1 Discrete measure: µ = pi Xi Xi Rd pi = 1 i=0 i Point cloud Constant weights: pi = 1/N . Xi Quotient space: RN d / N
  • 7. Discrete Distributions N 1 Discrete measure: µ = pi Xi Xi Rd pi = 1 i=0 i Point cloud Histogram Constant weights: pi = 1/N . Fixed positions Xi (e.g. grid) Xi Quotient space: A ne space: RN d / N {(pi )i i pi = 1}
  • 8. From Images to Statistics Source image (X ) Discretized image f RN d fi Rd = R3 N = #pixels, d = #colors. Source image Style image (Y ) J. Rabin Wasserstein Regularizatio
  • 9. From Images to Statistics Source image (X ) Discretized image f RN d fi Rd = R3 N = #pixels, d = #colors. Source image Disclamers: images are not distributions. image (Y ) Style J. Rabin Wasserstein Regularizatio Needs an estimator: f µf Modify f by controlling µf .
  • 10. From Images to Statistics Source image (X ) Discretized image f RN d fi Rd = R3 N = #pixels, d = #colors. Source image Disclamers: images are not distributions. image (Y ) Style J. Rabin Wasserstein Regularizatio Needs an estimator: f µf Modify f by controlling µf . Xi Point cloud discretization: µf = fi i
  • 11. From Images to Statistics Source image (X ) Discretized image f RN d fi Rd = R3 N = #pixels, d = #colors. Source image Disclamers: images are not distributions. image (Y ) Style J. Rabin Wasserstein Regularizatio Needs an estimator: f µf Modify f by controlling µf . Xi Point cloud discretization: µf = fi i Histogram discretization: µf = pi Xi i 1 Parzen windows: pi = (xi fj ) Zf j
  • 12. Overview • Discrete Optimal Transport • Continuous Optimal Transport • Displacement Interpolation
  • 13. Optimal Assignments N 1 Discrete distributions: µX = Xi N i=1 µY µX
  • 14. Optimal Assignments N 1 Discrete distributions: µX = Xi N i=1 µY µX Optimal assignment: ⇥ ⇥ argmin ||Xi Y (i) ||p Y (i) N i Xi
  • 15. Optimal Assignments N 1 Discrete distributions: µX = Xi N i=1 µY µX Optimal assignment: ⇥ ⇥ argmin ||Xi Y (i) ||p Y (i) N i Xi Wasserstein distance: Wp (µX , µY )p = ||Xi Y (i) ||p i Metric on the space of distributions.
  • 16. Optimal Assignments N 1 Discrete distributions: µX = Xi N i=1 µY µX Optimal assignment: ⇥ ⇥ argmin ||Xi Y (i) ||p Y (i) N i Xi Wasserstein distance: Wp (µX , µY )p = ||Xi Y (i) ||p i Metric on the space of distributions. Projection on statistical constraints: C = {f µf = µY } ProjC (f ) = Y
  • 17. Computing Transport Distances Explicit solution for 1D distribution (e.g. grayscale images): Xi Yi sorting the values, O(N log(N )) operations.
  • 18. Computing Transport Distances Explicit solution for 1D distribution (e.g. grayscale images): Xi Yi sorting the values, O(N log(N )) operations. Higher dimensions: combinatorial optimization methods Hungarian algorithm, auctions algorithm, etc. O(N 5/2 log(N )) operations. intractable for imaging problems.
  • 19. Computing Transport Distances Explicit solution for 1D distribution (e.g. grayscale images): Xi Yi sorting the values, O(N log(N )) operations. Higher dimensions: combinatorial optimization methods Hungarian algorithm, auctions algorithm, etc. O(N 5/2 log(N )) operations. intractable for imaging problems. Arbitrary distributions: µ= pi Xi ⇥= qi Yi i i Wp (µ, ) solution of a linear program. p
  • 20. Coupling Matrices N1 weighted distributions µ= i=1 pi Xi Extension to: arbitrary number of points = N2 i=1 qi Yi
  • 21. Coupling Matrices N1 weighted distributions µ= i=1 pi Xi Extension to: arbitrary number of points = N2 i=1 qi Yi If N1 = N2 , permutation matrix: P = P = ( i (j) )i,j ||Xi Y (i) ||p = Pi,j ||Xi Yj ||p i i,j
  • 22. Coupling Matrices N1 weighted distributions µ= i=1 pi Xi Extension to: arbitrary number of points = N2 i=1 qi Yi If N1 = N2 , permutation matrix: P = P = ( i (j) )i,j ||Xi Y (i) ||p = Pi,j ||Xi Yj ||p i i,j Probabilistic coupling: µ, = P RN1 N2 P 0, P 1 = p, P 1 = q p q
  • 23. Coupling Matrices N1 weighted distributions µ= i=1 pi Xi Extension to: arbitrary number of points = N2 i=1 qi Yi If N1 = N2 , permutation matrix: P = P = ( i (j) )i,j ||Xi Y (i) ||p = Pi,j ||Xi Yj ||p i i,j Probabilistic coupling: µ, = P RN1 N2 P 0, P 1 = p, P 1 = q p Defined even if N1 = N2 . Takes into account weights. Linear objective. q
  • 24. Kantorovitch Formulation Linear programming (Kantorovitch): W (µ, )p = P , C P argmin P, C = Ci,j Pi,j P µ, p i,j q Optimal coupling P : p q
  • 25. Kantorovitch Formulation Linear programming (Kantorovitch): W (µ, )p = P , C P argmin P, C = Ci,j Pi,j P µ, p i,j P extremal points: q P, P C =c st Optimal coupling P : p q
  • 26. Kantorovitch Formulation Linear programming (Kantorovitch): W (µ, )p = P , C P argmin P, C = Ci,j Pi,j P µ, p i,j P extremal points: q P, P C =c st Optimal coupling P : Theorem: If pi = qi = 1/N , N, P = P p q
  • 27. Our experiments also show that fixed-point precision further s Optimization Codes up the computation. We observed that the value of the final port cost is less accurate because of the limited precision, b the particle pairing that produces the actual interpolation s Discrete optimal transport: unchanged. We used the fixed point method to ge remains the results presented in this paper. The results of the perfor P argmin P, Cstudy are also of broader interest, as current EMD image re = Ci,j Pi,j P or color transfer techniques rely on slower solvers [Rubner µ, i,j 2000; Kanters et al. 2003; Morovic and Sun 2003]. 2 Linear program: 10 Network-Simplex-(fixed-point) Network-Simplex-(double-prec.) Transport-Simplex y-=-α-x2 Time-in-seconds Interior points: slow. y-=-β-x3 0 10 Network simplex. 10 Transportation simplex. 10 [Bonneel et al. 2011] 1 2 3 4 10 10 10 10 Problem-size-:-number-of-bins-per-histogram Figure 6: Log-log plot of the running times of different so The network simplex behaves as a O(n2 ) algorithm in pr Block search pivoting strategythe transportand O’Neill 1991] whereas [Kelly simplex runs in O(n3 ).
  • 28. pplication to Color Transfer Color Histogram Equalization 1 Input color images: fi RN 3 . projectioniof= to style Sliced Wasserstein ⇥ X N x fi (x) image color statistics Y Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Source image (X ) f1 f0 Sliced Wasserstein projec image color statistics Y f0 Source image after color transfer µ1 image (Y ) Style Source image (X ) µ0 J. Rabin Wasserstein Regularization
  • 29. pplication to Color Transfer Color Histogram Equalization 1 Input color images: fi RN 3 . projectioniof= to style Sliced Wasserstein ⇥ X N x fi (x) image color statistics Y Optimal assignement: min ||f0 f1 ⇥ || N Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Source image (X ) f1 f0 Sliced Wasserstein projec image color statistics Y f0 Source image after color transfer µ1 image (Y ) Style Source image (X ) µ0 J. Rabin Wasserstein Regularization
  • 30. pplication to Color Transfer Color Histogram Equalization 1 Input color images: fi RN 3 . projectioniof= to style Sliced Wasserstein ⇥ X N x fi (x) image color statistics Y Optimal assignement: min ||f0 f1 ⇥ || N Transport: T : f0 (x) R3 f1 ( (i)) R3 Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Source image (X ) f1 f0 Sliced Wasserstein projec image color statistics Y f0 Source image after color transfer µ1 image (Y ) Style Source image (X ) µ0 T J. Rabin Wasserstein Regularization
  • 31. pplication to Color Transfer Color Histogram Equalization 1 Input color images: fi RN 3 . projectioniof= to style Sliced Wasserstein ⇥ X N x fi (x) image color statistics Y Optimal assignement: min ||f0 f1 ⇥ || N Optimal transport framework Sliced Wasserstein projection Applications Transport: T : f0 (x) R3Application to Color Transfer R3 f1 ( (i)) Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer ˜ Equalization: ) f0 = T (f0 ) ˜ f0 = f1 Sliced Wasserstein projection of X to sty Source image (X image color statistics Y f1 f0 T (f ) 0 Sliced Wasserstein projec image color statistics Y Source image (X ) T f0 Source image after color transfer µ1 image (Y ) Style Source image (X ) µ0 Source image after color transfer µ1 Style image (Y ) T J. Rabin Wasserstein Regularization J. Rabin Wasserstein Regularization
  • 32. Overview • Discrete Optimal Transport • Continuous Optimal Transport • Displacement Interpolation
  • 33. ork, Measure Preserving Maps ica- d ofDistributions µ0 , µ1 on Rk . ase. eeds ans- that eme rate ance eval t al. µ0 µ1
  • 34. ork, Measure Preserving Maps ica- d ofDistributions µ0 , µ1 on Rk . ase. eeds Mass preserving map T : Rk Rk . ans- that µ1 = T µ0 where (T µ0 )(A) = µ0 (T (A)) 1 eme rate ance x T (x) eval t al. µ0 µ1
  • 35. ork, Measure Preserving Maps ica- d ofDistributions µ0 , µ1 on Rk . ase. eeds Mass preserving map T : Rk Rk . ans- that µ1 = T µ0 where (T µ0 )(A) = µ0 (T (A)) 1 eme rate ance x T (x) eval t al. µ0 µ1 Smooth distributions: µi = i (x)dx T µ0 = µ1 1 (T (x))|det ⇥T (x)| = 0 (x)
  • 36. Optimal Transport Lp optimal transport: W2 (µ0 , µ1 )p = min ||T (x) x||p µ0 (dx) T µ0 =µ1
  • 37. Optimal Transport Lp optimal transport: W2 (µ0 , µ1 )p = min ||T (x) x||p µ0 (dx) T µ0 =µ1 Regularity condition: µ0 or µ1 does not give mass to “small sets”. Theorem (p > 1): there exists a unique optimal T . T T µ1 µ0
  • 38. Optimal Transport Lp optimal transport: W2 (µ0 , µ1 )p = min ||T (x) x||p µ0 (dx) T µ0 =µ1 Regularity condition: µ0 or µ1 does not give mass to “small sets”. Theorem (p > 1): there exists a unique optimal T . Theorem (p = 2): T is defined as T = with convex. T T T (x) T (x ) T is monotone: µ1 x T (x) T (x ), x x 0 µ0 x
  • 39. From Continuous to Discrete Vector X R N d µX = Xi Grayscale: 1-D (image, coe cients, . . . ) i Colors: 3-D B G R
  • 40. From Continuous to Discrete Vector X R N d µX = Xi Grayscale: 1-D (image, coe cients, . . . ) i Colors: 3-D µY = T µX B G N, T : Xi Y (i) µY R µX Y (i) Xi
  • 41. From Continuous to Discrete Vector X R N d µX = Xi Grayscale: 1-D (image, coe cients, . . . ) i Colors: 3-D µY = T µX B G N, T : Xi Y (i) µY R µX Y (i) Xi Replace optimization on T by optimization on N. ||T (x) x||p µX (dx) = ||Xi Y (i) ||p i
  • 42. Continuous Wasserstein Distance µ Couplings: µ, x A Rd , ⇥(A Rd ) = µ(A) y B Rd , ⇥(Rd B) = (B)
  • 43. Continuous Wasserstein Distance µ Couplings: µ, x A Rd , ⇥(A Rd ) = µ(A) y B Rd , ⇥(Rd B) = (B) Transportation cost: Wp (µ, )p = min c(x, y)d⇥(x, y) µ, Rd Rd
  • 44. Continuous Wasserstein Distance µ Couplings: µ, x A Rd , ⇥(A Rd ) = µ(A) y B Rd , ⇥(Rd B) = (B) Transportation cost: Wp (µ, )p = min c(x, y)d⇥(x, y) µ, Rd Rd
  • 45. Continuous Optimal Transport Let p > 1 and µ does not vanish on small sets. Unique µ, s.t. Wp (µ, )p = c(x, y)d⇥(x, y) Rd Rd Optimal transport T : Rd Rd : µ x y (x, T (x))
  • 46. Continuous Optimal Transport Let p > 1 and µ does not vanish on small sets. Unique µ, s.t. Wp (µ, )p = c(x, y)d⇥(x, y) Rd Rd Optimal transport T : Rd Rd : µ x p = 2: T = unique solution of y ⇥ is convex l.s.c. (x, T (x)) ( ⇥)⇤µ =
  • 47. 1-D Continuous Wasserstein Distributions µ, on R. t Cumulative functions: Cµ (t) = dµ(x) For all p > 1: T =C 1 Cµ T is non-decreasing (“change of contrast”)
  • 48. 1-D Continuous Wasserstein Distributions µ, on R. t Cumulative functions: Cµ (t) = dµ(x) For all p > 1: T =C 1 Cµ T is non-decreasing (“change of contrast”) Explicit formulas: 1 H Wp (µ, )p = |Cµ 1 C 1 p | 0 W1 (µ, ) = |Cµ C | = ||(Cµ C ) ⇥ H||1 R
  • 49. Continuous Histogram Transfer f1 Input images: fi : [0, 1] 2 [0, 1], i = 0, 1. f0
  • 50. Continuous Histogram Transfer f1 Input images: fi : [0, 1] 2 [0, 1], i = 0, 1. Gray-value distributions: µi defined on [0, 1]. µi ([a, b]) = 1{a f b} (x)dx [0,1]2 µ1 f0 µ0
  • 51. Continuous Histogram Transfer f1 Input images: fi : [0, 1] 2 [0, 1], i = 0, 1. Gray-value distributions: µi defined on [0, 1]. µi ([a, b]) = 1{a f b} (x)dx [0,1]2 Optimal transport: T = Cµ11 Cµ0 . µ1 f0 Cµ0 (f0 ) T (f0 ) Cµ0 Cµ11 µ0 µ1
  • 52. Discrete Histogram Transfer Discretized grayscale images f0 , f1 RN . f1 f0
  • 53. Discrete Histogram Transfer Discretized grayscale images f0 , f1 RN . Discrete distributions µi = µfi = N 1 k fi (k) . f1 f0 µ1 µ0 10000 5000 8000 4000 6000 3000 4000 2000 2000 1000 0 0
  • 54. Discrete Histogram Transfer Discretized grayscale images f0 , f1 RN . Discrete distributions µi = µfi = N 1 k fi (k) . Sorting the values : i N s.t. fi ( i (k)) fi ( i (k + 1)). Optimal transport: T : f0 ( 0 (k)) f1 ( 1 (k)) f1 f0 µ1 µ0 10000 5000 8000 4000 6000 3000 4000 2000 2000 1000 0 0
  • 55. Discrete Histogram Transfer Discretized grayscale images f0 , f1 RN . Discrete distributions µi = µfi = N 1 k fi (k) . Sorting the values : i N s.t. fi ( i (k)) fi ( i (k + 1)). Optimal transport: T : f0 ( 0 (k)) f1 ( 1 (k)) f1 f0 T (f0 ) T 10000 8000 6000 µ1 5000 4000 µ0 10000 8000 6000 µ1 3000 4000 4000 2000 2000 2000 1000 0 0 0 0 50 100 150 200 250
  • 56. Discrete Histogram Transfer Discretized grayscale images f0 , f1 RN . Discrete distributions µi = µfi = N 1 k fi (k) . Sorting the values : i N s.t. fi ( i (k)) fi ( i (k + 1)). Optimal transport: T : f0 ( 0 (k)) f1 ( 1 (k)) Matlab code: [a,I] = sort(f0(:)); f0(I) = sort(f1(:)); f1 f0 T (f0 ) T 10000 8000 6000 µ1 5000 4000 µ0 10000 8000 6000 µ1 3000 4000 4000 2000 2000 2000 1000 0 0 0 0 50 100 150 200 250
  • 57. Gaussian Optimal Transport Input distributions (µ0 , µ1 ) with µi = N (mi , i ). Ellipses: Ei = x Rd (mi x) i 1 (mi x) c E0 E1
  • 58. Gaussian Optimal Transport Input distributions (µ0 , µ1 ) with µi = N (mi , i ). Ellipses: Ei = x Rd (mi x) i 1 (mi x) c E0 T E1 Theorem: If ker( 0) Im( 1) = {0}, T (x) = Sx + m1 m0 where 1/2 1/2 1/2 1/2 1/2 S= 1 + 0,1 1 0,1 =( 1 0 1 ) W2 (µ0 , µ1 )2 = tr ( 0+ 1 2 0,1 ) + ||m0 m1 ||2 ,
  • 59. PDE Formulations Smooth distributions: µi = i (x)dx T µ0 = µ1 1 (T (x))|det ⇥T (x)| = 0 (x)
  • 60. PDE Formulations Smooth distributions: µi = i (x)dx T µ0 = µ1 1 (T (x))|det ⇥T (x)| = 0 (x) L2 optimal transport map T = : 1( ⇥(x))det(H⇥) = 0 (x) (Monge-Amp`re) e
  • 61. PDE Formulations Smooth distributions: µi = i (x)dx T µ0 = µ1 1 (T (x))|det ⇥T (x)| = 0 (x) L2 optimal transport map T = : 1( ⇥(x))det(H⇥) = 0 (x) (Monge-Amp`re) e Fluid dynamic formulation: find (x, t) 0, m(x, t) Rd 1 ||m||2 ⇥t + ·m=0 W (µ0 , µ1 )2 = min s.t. ,m Rd 0 (0, ·) = 0, (1, ·) = 1 Finite element discretization [Benamou-Brenier]
  • 62. PDE Formulations Smooth distributions: µi = i (x)dx T µ0 = µ1 1 (T (x))|det ⇥T (x)| = 0 (x) L2 optimal transport map T = : 1( ⇥(x))det(H⇥) = 0 (x) (Monge-Amp`re) e Fluid dynamic formulation: find (x, t) 0, m(x, t) Rd 1 ||m||2 ⇥t + ·m=0 W (µ0 , µ1 )2 = min s.t. ,m Rd 0 (0, ·) = 0, (1, ·) = 1 Finite element discretization [Benamou-Brenier] Related works of [Tannenbaum et al.].
  • 63. cðdvÞ ¼ l0 ðv þ dvÞ detðrðv þ dvÞÞ À l1 ¼ 0: can be thought as an elliptic system thought as an The sys-system of equations. The sys- v cc > tem cv c> can be of equations. elliptic c the GPU. We usedcubic grid. Relaxation was performed using ainterpolation four- a trilinear interpolationused a trilinear parallelizable operator for transferring the GPU. We operator for transferring Image Registration s solved using preconditionedsolved that a correction for dv can be obtainedcoarse gridwith an Ittem is to verify using preconditioned conjugateby solving is easy conjugate À1 gradient with an gradient correction to finerelaxation scheme. This restriction color Gauss-Seidel grids. The residual to fine grids. The residual restriction the coarse grid correction increases robustness the the system dv % c> ðcv c> Þ preconditioner. Wright, 1999) The sys- cðvÞ (Nocedal and incomplete Cholesky mplete Cholesky preconditioner. v v operator for projecting residual fromfor projecting residual from the on and efficiency and is especially suited for the implementation fine to coarse grids is operator the fine to coarse grids is tem cv cc can be thought as an elliptic system of equations. The sys- > the GPU. We used a trilinear interpolation operator for transferring tem is solved using preconditioned conjugate gradient with an the coarse grid correction to fine grids. The residual restriction incomplete Cholesky preconditioner. operator for projecting residual from the fine to coarse grids is T [ur Rehman et al, 2009] Fig. 6. OMT Results viewed on an axial slice. The top row shows corresponding slices from Pre-op(Left) and Post-op(Right) MRI data. The deformation is clearly visible in the anterior part of the brain.
  • 64. Overview • Discrete Optimal Transport • Continuous Optimal Transport • Displacement Interpolation
  • 65. Wasserstein Geodesics Geodesics between (µ0 , µ1 ): t [0, 1] µt Variational caracterization: (W2 is a geodesic distance) µt = argmin (1 t)W2 (µ0 , µ)2 + tW2 (µ1 , µ)2 µ
  • 66. Wasserstein Geodesics Geodesics between (µ0 , µ1 ): t [0, 1] µt Variational caracterization: (W2 is a geodesic distance) µt = argmin (1 t)W2 (µ0 , µ)2 + tW2 (µ1 , µ)2 µ µ1 µ0 µ Optimal transport caracterization: µt = ((1 t)Id + tT ) µ0
  • 67. Wasserstein Geodesics Geodesics between (µ0 , µ1 ): t [0, 1] µt Variational caracterization: (W2 is a geodesic distance) µt = argmin (1 t)W2 (µ0 , µ)2 + tW2 (µ1 , µ)2 µ µ1 µ0 µ Optimal transport caracterization: µt = ((1 t)Id + tT ) µ0 µ0 Gaussian case: µt = Tt µ0 = N (mt , t) mt = (1 t)m0 + tm1 t = [(1 t)Id + tT ] 0 [(1 t)Id + tT ] the set of Gaussians is geodesically convex. µ1
  • 69. Texture Model Interpolation Probability analysis distribution µ = N (m, ) Exemplar f0
  • 70. Texture Model Interpolation Probability analysis synthesis distribution µ = N (m, ) Exemplar f0 Outputs f µ
  • 71. Texture Model Interpolation Probability analysis synthesis distribution µ = N (m, ) Exemplar f0 Outputs f µ f [0] t f [1] 0 1
  • 72. Conclusion Source image (X ) Statistical modeling of images. Source im Style image (Y ) J. Rabin Wasserstein Re
  • 73. Conclusion Source image (X ) Statistical modeling of images. Applications of OT: 14 Anonymous – Metric between descriptors. Source im Use the distance W2 (µ0 , µ1 ) Style image (Y ) J. Rabin Wasserstein Re P (⇥⇥ ) P (⇥ ) P (⇥ c ) P (⇥⇥ ) in P (⇥ ) out P (⇥ ) c
  • 74. Conclusion Source image (X ) Statistical modeling of images. Applications of OT: 14 Anonymous – Metric between descriptors. Source im Use the distance W2 (µ0 , µ1 ) Style image (Y ) J. Rabin Wasserstein Re – Projection on statistical constraints. Use the transport T P (⇥⇥ ) P (⇥ ) P (⇥ c ) P (⇥⇥ ) in P (⇥ ) out P (⇥ ) c
  • 75. Conclusion Source image (X ) Statistical modeling of images. Applications of OT: 14 Anonymous – Metric between descriptors. Source im Use the distance W2 (µ0 , µ1 ) Style image (Y ) J. Rabin Wasserstein Re – Projection on statistical constraints. Use the transport T – Mixing statistical models. P (⇥⇥ ) P (⇥ ) P (⇥ c ) P (⇥⇥ ) in P (⇥ ) out P (⇥ ) c