SlideShare a Scribd company logo
1 of 168
Download to read offline
Sparsity and
Compressed Sensing
        Gabriel Peyré
    www.numerical-tours.com
Overview

• Inverse Problems Regularization
• Sparse Synthesis Regularization
• Theoritical Recovery Guarantees
• Compressed Sensing
• RIP and Polytopes CS Theory
• Fourier Measurements
• Convex Optimization via Proximal Splitting
Inverse Problems
Forward model:    y = K f0 + w   RP

   Observations   Operator  (Unknown)   Noise
                  : RQ   RP   Input
Inverse Problems
Forward model:       y = K f0 + w   RP

    Observations     Operator  (Unknown)   Noise
                     : RQ   RP   Input
Denoising: K = IdQ , P = Q.
Inverse Problems
Forward model:          y = K f0 + w     RP

    Observations        Operator  (Unknown)          Noise
                        : RQ   RP   Input
Denoising: K = IdQ , P = Q.
Inpainting: set    of missing pixels, P = Q   | |.
                          0 if x     ,
           (Kf )(x) =
                          f (x) if x /    .




            K
Inverse Problems
Forward model:          y = K f0 + w     RP

    Observations        Operator  (Unknown)            Noise
                        : RQ   RP   Input
Denoising: K = IdQ , P = Q.
Inpainting: set    of missing pixels, P = Q     | |.
                          0 if x     ,
           (Kf )(x) =
                          f (x) if x /    .
Super-resolution: Kf = (f     k)   , P = Q/ .


            K                                 K
Inverse Problem in Medical Imaging
           Kf = (p k )1   k K
Inverse Problem in Medical Imaging
                       Kf = (p k )1   k K




Magnetic resonance imaging (MRI):            ˆ
                                       Kf = (f ( ))
                                ˆ
                                f
Inverse Problem in Medical Imaging
                        Kf = (p k )1   k K




Magnetic resonance imaging (MRI):             ˆ
                                        Kf = (f ( ))
                                  ˆ
                                  f




Other examples: MEG, EEG, . . .
Inverse Problem Regularization

Noisy measurements: y = Kf0 + w.

Prior model: J : RQ   R assigns a score to images.

                                1
                      f   argmin ||y    Kf ||2 +     J(f )
                           f RQ 2
Inverse Problem Regularization

Noisy measurements: y = Kf0 + w.

Prior model: J : RQ   R assigns a score to images.

                                1
                      f   argmin ||y Kf ||2 + J(f )
                           f RQ 2
                                Data fidelity Regularity
Inverse Problem Regularization

Noisy measurements: y = Kf0 + w.

Prior model: J : RQ       R assigns a score to images.

                                    1
                      f       argmin ||y Kf ||2 + J(f )
                               f RQ 2
                                    Data fidelity Regularity

Choice of : tradeo
            Noise level               Regularity of f0
                ||w||                     J(f0 )
Inverse Problem Regularization

Noisy measurements: y = Kf0 + w.

Prior model: J : RQ         R assigns a score to images.

                                      1
                        f       argmin ||y Kf ||2 + J(f )
                                 f RQ 2
                                      Data fidelity Regularity

Choice of : tradeo
              Noise level                 Regularity of f0
                  ||w||                       J(f0 )

No noise:       0+ , minimize         f       argmin J(f )
                                             f RQ ,Kf =y
Smooth and Cartoon Priors

              J(f ) =   || f (x)||2 dx




           | f |2
Smooth and Cartoon Priors

              J(f ) =       || f (x)||2 dx

                    J(f ) =      || f (x)||dx



            J(f ) =         length(Ct )dt
                        R




           | f |2                               | f|
Inpainting Example




Input y = Kf0 + w   Sobolev   Total variation
Overview

• Inverse Problems Regularization
• Sparse Synthesis Regularization
• Theoritical Recovery Guarantees
• Compressed Sensing
• RIP and Polytopes CS Theory
• Fourier Measurements
• Convex Optimization via Proximal Splitting
Redundant Dictionaries
Dictionary   =(   m )m   RQ   N
                                  ,N       Q.




                                       Q

                                                N
Redundant Dictionaries
Dictionary    =(    m )m        RQ   N
                                         ,N       Q.
Fourier:      m   = ei   ·, m

                                frequency




                                              Q

                                                       N
Redundant Dictionaries
Dictionary    =(       m )m      RQ    N
                                           ,N       Q.
                                                         m = (j, , n)
Fourier:      m    =e   i ·, m

                                 frequency           scale         position
Wavelets:
       m    = (2   j
                       R x        n)                     orientation



                                                     =1                =2


                                                Q

                                                          N
Redundant Dictionaries
Dictionary    =(       m )m      RQ    N
                                           ,N       Q.
                                                         m = (j, , n)
Fourier:      m    =e   i ·, m

                                 frequency           scale         position
Wavelets:
       m    = (2   j
                       R x        n)                     orientation

DCT, Curvelets, bandlets, . . .

                                                     =1                =2


                                                Q

                                                          N
Redundant Dictionaries
Dictionary    =(       m )m      RQ       N
                                              ,N       Q.
                                                            m = (j, , n)
Fourier:      m    =e   i ·, m

                                 frequency              scale         position
Wavelets:
       m    = (2   j
                       R x           n)                     orientation

DCT, Curvelets, bandlets, . . .

Synthesis: f =     m    xm       m   =    x.            =1                =2


                                                   Q                       =f
                                                                      x
                                                             N
Coe cients x       Image f =              x
Sparse Priors
                                      Coe cients x
Ideal sparsity: for most m, xm = 0.
     J0 (x) = # {m  xm = 0}




                                        Image f0
Sparse Priors
                                         Coe cients x
Ideal sparsity: for most m, xm = 0.
     J0 (x) = # {m  xm = 0}
Sparse approximation: f =      x where
      argmin ||f0    x||2 + T J0 (x)
       x RN




                                           Image f0
Sparse Priors
                                                Coe cients x
Ideal sparsity: for most m, xm = 0.
     J0 (x) = # {m  xm = 0}
Sparse approximation: f =          x where
       argmin ||f0       x||2 + T J0 (x)
        x RN

Orthogonal     :       =       = IdN
              f0 , m if | f0 ,    m    | > T,
 xm =
             0 otherwise.                 ST      Image f0
  f=         ST      (f0 )
Sparse Priors
                                                Coe cients x
Ideal sparsity: for most m, xm = 0.
     J0 (x) = # {m  xm = 0}
Sparse approximation: f =          x where
       argmin ||f0       x||2 + T J0 (x)
        x RN

Orthogonal     :       =       = IdN
              f0 , m if | f0 ,    m    | > T,
 xm =
             0 otherwise.                 ST      Image f0
  f=         ST      (f0 )

Non-orthogonal :
       NP-hard.
Convex Relaxation: L1 Prior
                       J0 (x) = # {m  xm = 0}
                        J0 (x) = 0        null image.
Image with 2 pixels:    J0 (x) = 1        sparse image.
                        J0 (x) = 2        non-sparse image.
   x2

         x1


  q=0
Convex Relaxation: L1 Prior
                             J0 (x) = # {m  xm = 0}
                               J0 (x) = 0       null image.
Image with 2 pixels:           J0 (x) = 1       sparse image.
                               J0 (x) = 2       non-sparse image.
     x2

           x1


     q=0           q = 1/2         q=1      q = 3/2       q=2
 q
     priors:        Jq (x) =       |xm |q      (convex for q    1)
                               m
Convex Relaxation: L1 Prior
                                  J0 (x) = # {m  xm = 0}
                                    J0 (x) = 0          null image.
Image with 2 pixels:                J0 (x) = 1          sparse image.
                                    J0 (x) = 2          non-sparse image.
     x2

               x1


     q=0                q = 1/2         q=1         q = 3/2       q=2
 q
     priors:             Jq (x) =       |xm |q         (convex for q    1)
                                    m



Sparse     1
               prior:      J1 (x) =         |xm |
                                        m
L1 Regularization

 x0 RN
coe cients
L1 Regularization

 x0 RN          f0 = x0 RQ
coe cients          image
L1 Regularization

 x0 RN          f0 = x0 RQ       y = Kf0 + w RP
coe cients          image           observations
                             K

                             w
L1 Regularization

 x0 RN          f0 = x0 RQ            y = Kf0 + w RP
coe cients          image                observations
                                  K

                              w


                 = K ⇥ ⇥ RP   N
L1 Regularization

 x0 RN            f0 = x0 RQ             y = Kf0 + w RP
coe cients            image                 observations
                                     K

                                  w


                  = K ⇥ ⇥ RP     N



 Sparse recovery: f =   x where x solves
            1
        min   ||y     x||2 + ||x||1
       x RN 2
               Fidelity Regularization
Noiseless Sparse Regularization
Noiseless measurements:        y = x0

              x
                      x=
                           y




 x    argmin          |xm |
        x=y       m
Noiseless Sparse Regularization
Noiseless measurements:        y = x0

              x
                                                 x
                      x=                              x=
                           y                               y




 x    argmin          |xm |        x    argmin       |xm |2
        x=y       m                       x=y    m
Noiseless Sparse Regularization
Noiseless measurements:          y = x0

                x
                                                        x
                        x=                                     x=
                             y                                      y




  x    argmin           |xm |          x      argmin          |xm |2
          x=y       m                            x=y     m


Convex linear program.
      Interior points, cf. [Chen, Donoho, Saunders] “basis pursuit”.
      Douglas-Rachford splitting, see [Combettes, Pesquet].
Noisy Sparse Regularization
Noisy measurements:      y = x0 + w

             1
 x    argmin ||y    x||2 + ||x||1
       x RQ 2
            Data fidelity Regularization
Noisy Sparse Regularization
Noisy measurements:      y = x0 + w

             1
 x    argmin ||y    x||2 + ||x||1
       x RQ 2                             Equivalence
            Data fidelity Regularization

 x     argmin ||x||1
      || x y||
                                          |
                                              x=
                                      x            y|
Noisy Sparse Regularization
Noisy measurements:               y = x0 + w

                 1
  x       argmin ||y    x||2 + ||x||1
           x RQ 2                                   Equivalence
                Data fidelity Regularization

  x       argmin ||x||1
         || x y||
                                                    |
                                                        x=
Algorithms:                                     x            y|
      Iterative soft thresholding
             Forward-backward splitting
 see [Daubechies et al], [Pesquet et al], etc
      Nesterov multi-steps schemes.
Image De-blurring




Original f0   y = h f0 + w
Image De-blurring




  Original f0     y = h f0 + w           Sobolev
                                       SNR=22.7dB
Sobolev regularization:   f = argmin ||f ⇥ h   y||2 + ||⇥f ||2
                                 f RN
                          ˆ
                          h(⇥)
          ˆ
          f (⇥) =                    y (⇥)
                                     ˆ
                     ˆ
                    |h(⇥)|2 + |⇥|2
Image De-blurring




  Original f0      y = h f0 + w            Sobolev            Sparsity
                                         SNR=22.7dB        SNR=24.7dB
Sobolev regularization:       f = argmin ||f ⇥ h   y||2 + ||⇥f ||2
                                     f RN
                              ˆ
                              h(⇥)
            ˆ
            f (⇥) =                    y (⇥)
                                       ˆ
                       ˆ
                      |h(⇥)|2 + |⇥|2

Sparsity regularization:          = translation invariant wavelets.
                                        1
f =     x       where     x      argmin ||h ( x) y||2 + ||x||1
                                    x   2
Inpainting Problem


               K                         0 if x     ,
                            (Kf )(x) =
                                         f (x) if x /   .

Measures:     y = Kf0 + w
Image Separation
Model: f = f1 + f2 + w, (f1 , f2 ) components, w noise.
Image Separation
Model: f = f1 + f2 + w, (f1 , f2 ) components, w noise.
Image Separation
Model: f = f1 + f2 + w, (f1 , f2 ) components, w noise.




Union dictionary:         =[    1,     2]      RQ   (N1 +N2 )


Recovered component: fi =            i xi .
                                       1
         (x1 , x2 )      argmin          ||f        x||2 + ||x||1
                      x=(x1 ,x2 ) RN   2
Examples of Decompositions
Cartoon+Texture Separation
Overview

• Inverse Problems Regularization
• Sparse Synthesis Regularization
• Theoritical Recovery Guarantees
• Compressed Sensing
• RIP and Polytopes CS Theory
• Fourier Measurements
• Convex Optimization via Proximal Splitting
Basics of Convex Analysis
Setting:   G:H     R ⇤ {+⇥}        Here: H = RN .

             Problem:   min G(x)
                        x H
Basics of Convex Analysis
Setting:   G:H     R ⇤ {+⇥}         Here: H = RN .

             Problem:   min G(x)
                        x H

Convex:      t [0, 1]
                                               x     y
   G(tx + (1 t)y) tG(x) + (1       t)G(y)
Basics of Convex Analysis
Setting:   G:H     R ⇤ {+⇥}         Here: H = RN .

             Problem:   min G(x)
                        x H

Convex:      t [0, 1]
                                                  x         y
   G(tx + (1 t)y) tG(x) + (1       t)G(y)
Sub-di erential:
      G(x) = {u ⇥ H  ⇤ z, G(z)    G(x) + ⌅u, z       x⇧}
                                              G(x) = |x|



                                              G(0) = [ 1, 1]
Basics of Convex Analysis
Setting:   G:H     R ⇤ {+⇥}           Here: H = RN .

             Problem:   min G(x)
                        x H

Convex:      t [0, 1]
                                                    x         y
   G(tx + (1 t)y) tG(x) + (1         t)G(y)
Sub-di erential:
      G(x) = {u ⇥ H  ⇤ z, G(z)      G(x) + ⌅u, z       x⇧}

Smooth functions:                               G(x) = |x|
    If F is C 1 , F (x) = { F (x)}

                                                G(0) = [ 1, 1]
Basics of Convex Analysis
Setting:   G:H      R ⇤ {+⇥}          Here: H = RN .

             Problem:     min G(x)
                          x H

Convex:      t [0, 1]
                                                    x         y
   G(tx + (1 t)y) tG(x) + (1         t)G(y)
Sub-di erential:
      G(x) = {u ⇥ H  ⇤ z, G(z)      G(x) + ⌅u, z       x⇧}

Smooth functions:                               G(x) = |x|
    If F is C 1 , F (x) = { F (x)}
First-order conditions:
     x    argmin G(x)           0   G(x )       G(0) = [ 1, 1]
           x H
L1 Regularization: First Order Conditions
                      1
     x ⇥ argmin G(x) = ||y                   x||2 + ||x||1
          x RQ        2

⇥G(x) =           ( x   y) + ⇥|| · ||1 (x)

                    sign(xi ) if xi ⇥= 0,
|| · ||1 (x)i =
                    [ 1, 1] if xi = 0.
L1 Regularization: First Order Conditions
                       1
      x ⇥ argmin G(x) = ||y                    x||2 + ||x||1
           x RQ        2

⇥G(x) =            ( x    y) + ⇥|| · ||1 (x)

                     sign(xi ) if xi ⇥= 0,
 || · ||1 (x)i =
                     [ 1, 1] if xi = 0.
                                                      xi
Support of the solution:
                                                               i
 I = {i ⇥ {0, . . . , N      1}  xi ⇤= 0}
L1 Regularization: First Order Conditions
                        1
       x ⇥ argmin G(x) = ||y                   x||2 + ||x||1
            x RQ        2

 ⇥G(x) =            ( x   y) + ⇥|| · ||1 (x)

                      sign(xi ) if xi ⇥= 0,
  || · ||1 (x)i =
                      [ 1, 1] if xi = 0.
                                                      xi
Support of the solution:
                                                                      i
 I = {i ⇥ {0, . . . , N      1}  xi ⇤= 0}

Restrictions:
      xI = (xi )i     I   R|I|         I   = ( i )i   I    RP   |I|
L1 Regularization: First Order Conditions
              1                           xi
 x    argmin || x y||2 + ||x||1   P (y)
       x RN 2
                                                     i
First order condition:
      ( x    y) + s = 0
                sI = sign(xI ),
     where
                ||sI c || 1
L1 Regularization: First Order Conditions
              1                           xi
 x    argmin || x y||2 + ||x||1   P (y)
       x RN 2
                                                        i
First order condition:
      ( x    y) + s = 0
                                           i,   y   x
               sI = sign(xI ),                          i
    where
               ||sI c ||  1
              1
   =   sI c =      I c (y  x )
L1 Regularization: First Order Conditions
              1                           xi
 x    argmin || x y||2 + ||x||1   P (y)
       x RN 2
                                                          i
First order condition:
      ( x     y) + s = 0
                                           i,   y   x
               sI = sign(xI ),                            i
    where
               ||sI c ||  1
              1
   =   sI c =      I c (y  x )


Theorem: ||   Ic (   x   y)||       x solution of P (y)
L1 Regularization: First Order Conditions
              1                           xi
 x    argmin || x y||2 + ||x||1   P (y)
       x RN 2
                                                          i
First order condition:
      ( x     y) + s = 0
                                            i,   y   x
               sI = sign(xI ),                            i
    where
               ||sI c ||  1
              1
   =   sI c =      I c (y  x )


Theorem: ||   Ic (   x   y)||       x solution of P (y)

Theorem: If I has full rank and || I c ( x   y)||    <
         then x is the unique solution of P (y)
Local Behavior of the Solution
                           1
               x     argmin || x               y||2 + ||x||1
                      x RN 2

First order condition:            ( x          y) + s = 0
  =    xI =   +
              I y    (   I   I)
                                  1
                                      sign(xI )          (implicit equation)
          = x0,I +   +
                     I w      (       I   I)
                                                1
                                                    sI
Local Behavior of the Solution
                              1
                 x      argmin || x               y||2 + ||x||1
                         x RN 2

First order condition:               ( x          y) + s = 0
  =    xI =     +
                I y     (   I   I)
                                     1
                                         sign(xI )          (implicit equation)
             = x0,I +   +
                        I w      (       I   I)
                                                   1
                                                       sI

Intuition:    sI = sign(xI ) = sign(x0,I ) = s0,I                 for small w.
                (unknown)           (known)
Local Behavior of the Solution
                              1
                 x      argmin || x                y||2 + ||x||1
                         x RN 2

First order condition:               ( x           y) + s = 0
  =    xI =     +
                I y     (   I   I)
                                      1
                                          sign(xI )             (implicit equation)
             = x0,I +   +
                        I w       (       I   I)
                                                       1
                                                           sI

Intuition:    sI = sign(xI ) = sign(x0,I ) = s0,I                          for small w.
                (unknown)           (known)

To prove:     xI = x0,I +
              ˆ                 +
                                I w           (    I       I)
                                                                1
                                                                    s0,I
               is the unique solution.
Local Behavior of the Solution
Candidate for the solution:
         xI = x0,I +
         ˆ             +
                       I w    (   I   I)
                                           1
                                               s0,I
Local Behavior of the Solution
Candidate for the solution:
         xI = x0,I +
         ˆ                     +
                               I w      (   I   I)
                                                     1
                                                         s0,I


To prove:   ||   Ic (     ˆ
                        I xI     y)||   <1
Local Behavior of the Solution
Candidate for the solution:
              xI = x0,I +
              ˆ                              +
                                             I w      (    I   I)
                                                                    1
                                                                        s0,I


To prove:           ||     Ic (         ˆ
                                      I xI     y)||       <1


     1                                                w
             Ic (          ˆ
                         I xI         y) =     I               I (s0,I )




                                                                                +,
         I   =           Ic (     I
                                       +
                                       I      Id)               I   =      Ic   I
Local Behavior of the Solution
Candidate for the solution:
              xI = x0,I +
              ˆ                              +
                                             I w        (    I   I)
                                                                       1
                                                                           s0,I


To prove:           ||     Ic (         ˆ
                                      I xI     y)||         <1


     1                                                  w
             Ic (          ˆ
                         I xI         y) =     I                 I (s0,I )


                 can be made                                          || · || must
               small when w                         0                      be < 1
                                                                                   +,
         I   =           Ic (     I
                                       +
                                       I      Id)                 I    =      Ic   I
Robustness to Small Noise
Identifiability crition: [Fuchs]
    For s ⇥ { 1, 0, +1}N , let I = supp(s)
          F(s) = ||   I sI ||     where   I   =   Ic
                                                       +,
                                                       I
Robustness to Small Noise
Identifiability crition: [Fuchs]
    For s ⇥ { 1, 0, +1}N , let I = supp(s)
          F(s) = ||   I sI ||             where        I   =     Ic
                                                                      +,
                                                                      I

Theorem: [Fuchs 2004]           If F (sign(x0 )) < 1, T = min |x0,i |
                                                                      i I
     If ||w||/T is small enough and                  ||w||, then
            x0,I +     +
                       I w        (   I   I)
                                               1
                                                   sign(x0,I )
     is the unique solution of P (y).
Robustness to Small Noise
Identifiability crition: [Fuchs]
    For s ⇥ { 1, 0, +1}N , let I = supp(s)
          F(s) = ||   I sI ||             where        I   =     Ic
                                                                      +,
                                                                      I

Theorem: [Fuchs 2004]           If F (sign(x0 )) < 1, T = min |x0,i |
                                                                      i I
     If ||w||/T is small enough and                  ||w||, then
            x0,I +     +
                       I w        (   I   I)
                                               1
                                                   sign(x0,I )
     is the unique solution of P (y).

    When w = 0, F (sign(x0 ) < 1               =        x = x0 .
Robustness to Small Noise
 Identifiability crition: [Fuchs]
     For s ⇥ { 1, 0, +1}N , let I = supp(s)
           F(s) = ||   I sI ||             where        I   =     Ic
                                                                       +,
                                                                       I

Theorem: [Fuchs 2004]            If F (sign(x0 )) < 1, T = min |x0,i |
                                                                       i I
      If ||w||/T is small enough and                  ||w||, then
             x0,I +     +
                        I w        (   I   I)
                                                1
                                                    sign(x0,I )
      is the unique solution of P (y).

     When w = 0, F (sign(x0 ) < 1               =        x = x0 .

Theorem: [Grassmair et al. 2010]                If F (sign(x0 )) < 1
            if     ||w||, ||x          x0 || = O(||w||)
Geometric Interpretation
                                                                      +,
                                                             dI =          sI
  F(s) = ||   I sI ||   = max | dI ,            j   |                 I             i
                             j /I

where dI defined by:            dI =        I(       I   I)
                                                             1
                                                                 sI
              i    I, dI ,     i    = si                                        j
Geometric Interpretation
                                                                          +,
                                                             dI =               sI
  F(s) = ||   I sI ||   = max | dI ,            j   |                     I               i
                             j /I

where dI defined by:            dI =        I(       I   I)
                                                             1
                                                                 sI
              i    I, dI ,     i    = si                                             j

Condition F (s) < 1: no vector                  j   inside the cap Cs .

                                                                               dI
                                                                      j              Cs
                                                                                i




                                                                               | dI , ⇥| < 1
Geometric Interpretation
                                                                                  +,
                                                                     dI =               sI
  F(s) = ||   I sI ||       = max | dI ,                j   |                     I               i
                                j /I

where dI defined by:               dI =             I(       I   I)
                                                                     1
                                                                         sI
              i    I, dI ,        i    = si                                                  j

Condition F (s) < 1: no vector                          j   inside the cap Cs .
           dI
                        j                                                              dI
       i                    k          | dI , ⇥| < 1                          j              Cs
                                                                                        i




                                                                                       | dI , ⇥| < 1
Robustness to Bounded Noise
Exact Recovery Criterion (ERC): [Tropp]
   For a support I ⇥ {0, . . . , N          1} with             I    full rank,

   ERC(I) = ||     I ||   ,           where         I    =      Ic
                                                                      +,
                                                                      I
            = ||   +
                   I      Ic   ||1,1 = max ||
                                         c
                                                +
                                                I       j ||1
                                      j I

             (use ||(aj )j ||1,1 = maxj ||aj ||1 )

Relation with F criterion:            ERC(I) =                  max        F(s)
                                                         s,supp(s) I
Robustness to Bounded Noise
Exact Recovery Criterion (ERC): [Tropp]
   For a support I ⇥ {0, . . . , N          1} with             I    full rank,

   ERC(I) = ||     I ||   ,           where         I    =      Ic
                                                                      +,
                                                                      I
            = ||   +
                   I      Ic   ||1,1 = max ||
                                         c
                                                +
                                                I       j ||1
                                      j I

             (use ||(aj )j ||1,1 = maxj ||aj ||1 )

Relation with F criterion:            ERC(I) =                  max        F(s)
                                                         s,supp(s) I


 Theorem:      If ERC(supp(x0 )) < 1 and                             ||w||, then
      x is unique, satisfies supp(x )                      supp(x0 ), and
                   ||x0         x || = O(||w||)
Example: Random Matrix

           P = 200, N = 1000
 1


0.8


0.6


0.4


0.2


 0

  0   10     20    30     40     50
       w-ERC < 1         F <1
         ERC < 1        x = x0
Example: Deconvolution
  ⇥x =        xi (·   i)               x0
          i
Increasing :
     reduces correlation.              x0
     reduces resolution.




                              F (s)
                             ERC(I)
                            w-ERC(I)
Coherence Bounds
Mutual coherence:     µ( ) = max |   i,   j ⇥|
                             i=j

                                                  |I|µ( )
Theorem: F(s)       ERC(I)   w-ERC(I)
                                            1    (|I| 1)µ( )
Coherence Bounds
Mutual coherence:       µ( ) = max |        i,   j ⇥|
                                    i=j

                                                         |I|µ( )
Theorem: F(s)        ERC(I)         w-ERC(I)
                                                   1    (|I| 1)µ( )

                                1        1
Theorem:        If   ||x0 ||0 <     1+           and          ||w||,
                                2      µ( )
  one has supp(x )       I, and      ||x0   x || = O(||w||)
Coherence Bounds
Mutual coherence:       µ( ) = max |        i,   j ⇥|
                                    i=j

                                                         |I|µ( )
Theorem: F(s)        ERC(I)         w-ERC(I)
                                                   1    (|I| 1)µ( )

                                1        1
 Theorem:       If   ||x0 ||0 <     1+            and         ||w||,
                                2      µ( )
  one has supp(x )       I, and      ||x0   x || = O(||w||)

                          N P
One has:    µ( )
                         P (N 1)                 Optimistic setting:
For Gaussian matrices:                            ||x0 ||0 O( P )
           µ( )     log(P N )/P
For convolution matrices: useless criterion.
Spikes and Sinusoids Separation
Incoherent pair of orthobases:       Diracs/Fourier
                                                           2i
    1   = {k ⇤⇥ [k    m]}m       2   = k     N   1/2
                                                       e    N   mk
                                                                     m
     =[    1,   2]   RN   2N
Spikes and Sinusoids Separation
Incoherent pair of orthobases:              Diracs/Fourier
                                                                    2i
    1   = {k ⇤⇥ [k     m]}m             2   = k       N   1/2
                                                                e    N   mk
                                                                              m
     =[    1, 2]     RN      2N

           1
      min ||y        x||2 + ||x||1
    x R2N 2
             1
      min      ||y    1 x1        2 x2 ||2 + ||x1 ||1 + ||x2 ||1
  x1 ,x2 RN 2


                     =                            +
Spikes and Sinusoids Separation
Incoherent pair of orthobases:              Diracs/Fourier
                                                                    2i
    1   = {k ⇤⇥ [k     m]}m             2   = k       N   1/2
                                                                e    N   mk
                                                                              m
     =[    1, 2]     RN      2N

           1
      min ||y        x||2 + ||x||1
    x R2N 2
             1
      min      ||y    1 x1        2 x2 ||2 + ||x1 ||1 + ||x2 ||1
  x1 ,x2 RN 2


                     =                            +

          1
µ( ) =           =        separates up to         N /2 Diracs + sines.
          N
Overview

• Inverse Problems Regularization
• Sparse Synthesis Regularization
• Theoritical Recovery Guarantees
• Compressed Sensing
• RIP and Polytopes CS Theory
• Fourier Measurements
• Convex Optimization via Proximal Splitting
Pointwise Sampling and Smoothness
Data aquisition:           ˜          ˜
                   f [i] = f (i/N ) = f ,   i
                                                    0

                                                        1

                   Sensors                                  2
                    ( i )i
                   (Diracs)
   ˜
   f   L2                            f   RN
                               ˆ
                               ˜
Shannon interpolation: if Supp(f )       [ N ,N ]
Pointwise Sampling and Smoothness
Data aquisition:               ˜          ˜
                       f [i] = f (i/N ) = f ,   i
                                                         0

                                                             1

                       Sensors                                   2
                        ( i )i
                       (Diracs)
   ˜
   f   L2                                 f   RN
                               ˆ
                               ˜
Shannon interpolation: if Supp(f )            [ N ,N ]
        ˜
        f (t) =         f [i]h(N t   i)
                   i
                                sin( t)
             where       h(t) =
                                    t
Pointwise Sampling and Smoothness
Data aquisition:                ˜          ˜
                        f [i] = f (i/N ) = f ,   i
                                                          0

                                                              1

                        Sensors                                   2
                         ( i )i
                        (Diracs)
   ˜
   f    L2                                 f   RN
                               ˆ
                               ˜
Shannon interpolation: if Supp(f )             [ N ,N ]
          ˜
          f (t) =        f [i]h(N t   i)
                    i
                                 sin( t)
              where       h(t) =
                                     t
       Natural images are not smooth.
       But can be compressed e ciently.
Single Pixel Camera (Rice)




y[i] = f0 ,   i⇥
Single Pixel Camera (Rice)




y[i] = f0 ,   i⇥

                   f0 , N = 2562   f , P/N = 0.16   f , P/N = 0.02
CS Hardware Model
                                              ˜
CS is about designing hardware: input signals f    L2 (R2 ).
Physical hardware resolution limit: target resolution f   RN .

                 array                       micro
  ˜
  f   L 2
                               f    R   N
                                             mirrors           y   RP
               resolution
                                               K
            CS hardware
CS Hardware Model
                                              ˜
CS is about designing hardware: input signals f    L2 (R2 ).
Physical hardware resolution limit: target resolution f   RN .

                 array                       micro
  ˜
  f   L 2
                               f    R   N
                                             mirrors           y   RP
               resolution
                                               K
            CS hardware


                     ,
                     ,
                ...




                     ,
CS Hardware Model
                                              ˜
CS is about designing hardware: input signals f    L2 (R2 ).
Physical hardware resolution limit: target resolution f   RN .

                 array                       micro
  ˜
  f   L 2
                               f    R   N
                                             mirrors           y   RP
               resolution
                                               K
            CS hardware


                     ,
                                                       Operator K
                     ,                                                  f
                ...




                     ,
Sparse CS Recovery
                                f0   RN
f0   RN sparse in ortho-basis




                                x0   RN
Sparse CS Recovery
                                       f0   RN
f0   RN sparse in ortho-basis

(Discretized) sampling acquisition:
      y = Kf0 + w = K      (x0 ) + w
                    =




                                       x0   RN
Sparse CS Recovery
                                               f0   RN
f0   RN sparse in ortho-basis

(Discretized) sampling acquisition:
      y = Kf0 + w = K          (x0 ) + w
                    =
K drawn from the Gaussian matrix ensemble
        Ki,j   N (0, P   1/2
                               ) i.i.d.
     drawn from the Gaussian matrix ensemble

                                               x0   RN
Sparse CS Recovery
                                                                   f0    RN
f0     RN sparse in ortho-basis

(Discretized) sampling acquisition:
        y = Kf0 + w = K                  (x0 ) + w
                      =
K drawn from the Gaussian matrix ensemble
            Ki,j       N (0, P     1/2
                                         ) i.i.d.
       drawn from the Gaussian matrix ensemble

 Sparse recovery:                                                  x0    RN
                               ||w||                   1
         min          ||x||1                        min || x   y||2 + ||x||1
     || x y|| ||w||                                  x 2
CS Simulation Example




Original f0
               = translation invariant
                 wavelet frame
Overview

• Inverse Problems Regularization
• Sparse Synthesis Regularization
• Theoritical Recovery Guarantees
• Compressed Sensing
• RIP and Polytopes CS Theory
• Fourier Measurements
• Convex Optimization via Proximal Splitting
CS with RIP

 1
     recovery:
                                                   y = x0 + w
         x⇥    argmin ||x||1        where
                  || x y||                         ||w||

Restricted Isometry Constants:
     ⇥ ||x||0     k,   (1    k )||x||2   || x||2    (1 +   k )||x||2
CS with RIP

 1
     recovery:
                                                   y = x0 + w
         x⇥    argmin ||x||1        where
                  || x y||                         ||w||

Restricted Isometry Constants:
     ⇥ ||x||0     k,   (1    k )||x||2   || x||2    (1 +   k )||x||2


Theorem:          If   2k 2 1, then          [Candes 2009]
                          C0
            ||x0 x || ⇥ ||x0 xk ||1 + C1
                           k
     where xk is the best k-term approximation of x0 .
Singular Values Distributions
Eigenvalues of               I     I   with |I| = k are essentially in [a, b]
 a = (1                 )2         and    b = (1                    )2   where          = k/P
When k = P      + , the eigenvalue distribution tends to
               1
     f (⇥) =       (⇥ b)+ (a ⇥)+         [Marcenko-Pastur]
          1.5
             2⇤ ⇥                              P=200, k=10

                                               P=200, k=10



                    f ( )
          1.5
            1

            1
          0.5




                                                                   P = 200, k = 10
          0.5
            0
                0            0.5           1                 1.5         2        2.5
           0
                0            0.5           1   P=200, k=30   1.5         2        2.5

           1
                                               P=200, k=30
          0.8
            1

          0.6
          0.8

          0.4


                                                                             k = 30
          0.6

          0.2
          0.4

            0
          0.2
                0            0.5           1                 1.5         2        2.5
           0
                0            0.5           1   P=200, k=50   1.5         2        2.5

                                               P=200, k=50
          0.8

          0.8
          0.6

          0.6
          0.4
                            Large deviation inequality [Ledoux]
          0.4
          0.2
RIP for Gaussian Matrices

Link with coherence:        µ( ) = max |   i,   j ⇥|
                                   i=j
          2   = µ( )
          k     (k     1)µ( )
RIP for Gaussian Matrices

Link with coherence:        µ( ) = max |   i,   j ⇥|
                                   i=j
          2   = µ( )
          k     (k     1)µ( )

For Gaussian matrices:
       µ( )          log(P N )/P
RIP for Gaussian Matrices

Link with coherence:                µ( ) = max |    i,   j ⇥|
                                              i=j
           2   = µ( )
           k        (k        1)µ( )

For Gaussian matrices:
        µ( )                 log(P N )/P
Stronger result:
                                    C
Theorem:       If        k                P
                                log(N/P )
         then       2k          2   1 with high probability.
Numerics with RIP
Stability constant of A:
      (1   ⇥1 (A))|| ||2   ||A ||2   (1 + ⇥2 (A))|| ||2

           smallest / largest eigenvalues of A A
Numerics with RIP
Stability constant of A:
      (1       ⇥1 (A))|| ||2        ||A ||2   (1 + ⇥2 (A))|| ||2

               smallest / largest eigenvalues of A A

Upper/lower RIC:
                                                                   ˆ2
                                                                   k
           i
           k   = max     i(    I)
                 |I|=k
                                                    2   1          ˆ2
                                                                   k
           k   = min(    k, k)
                         1 2



Monte-Carlo estimation:
         ˆk    k                                                   k
                                                 N = 4000, P = 1000
Polytopes-based Guarantees
Noiseless recovery:       x      argmin ||x||1            (P0 (y))
                                     x=y


                              = ( i )i       R2   3
                                                              3             2




                                                      1


            x0                                                              x0
                                                                                    1
                                 y       x
                                                                                3
B = {x  ||x||1       }                                        2
                                                                     (B )
  = ||x0 ||1
Polytopes-based Guarantees
Noiseless recovery:       x      argmin ||x||1            (P0 (y))
                                     x=y


                              = ( i )i       R2   3
                                                              3             2




                                                      1


            x0                                                              x0
                                                                                    1
                                 y       x
                                                                                3
B = {x  ||x||1       }                                        2
                                                                     (B )
  = ||x0 ||1

              x0 solution of P0 ( x0 )                    ⇥        x0 ⇤     (B )
L1 Recovery in 2-D
                           = ( i )i   R2   3



                                                   C(0,1,1)   2
                                               3
                      K(0,1,1)
                                                                  1




                                  y   x



     2-D quadrant                                  2-D cones
Ks = ( i si )i R3     i      0                    Cs = Ks
Polytope Noiseless Recovery
Counting faces of random polytopes:                [Donoho]
  All x0 such that ||x0 ||0    Call (P/N )P are identifiable.
  Most x0 such that ||x0 ||0     Cmost (P/N )P are identifiable.

   Call (1/4)   0.065
                                    1

                                 0.9


 Cmost (1/4)    0.25             0.8

                                 0.7

                                 0.6

  Sharp constants.               0.5

                                 0.4

  No noise robustness.           0.3

                                 0.2

                                 0.1

                                    0
                                        50   100   150   200   250   300   350   400




                              RIP
                                         All                   Most
Polytope Noiseless Recovery
Counting faces of random polytopes:                 [Donoho]
   All x0 such that ||x0 ||0    Call (P/N )P are identifiable.
   Most x0 such that ||x0 ||0     Cmost (P/N )P are identifiable.

   Call (1/4)    0.065
                                     1

                                  0.9


 Cmost (1/4)     0.25             0.8

                                  0.7

                                  0.6

   Sharp constants.               0.5

                                  0.4

   No noise robustness.           0.3



   Computation of
                                  0.2

                                  0.1


 “pathological” signals              0
                                         50   100   150   200   250   300   350   400


[Dossal, P, Fadili, 2010]
                               RIP
                                          All                   Most
Overview

• Inverse Problems Regularization
• Sparse Synthesis Regularization
• Theoritical Recovery Guarantees
• Compressed Sensing
• RIP and Polytopes CS Theory
• Fourier Measurements
• Convex Optimization via Proximal Splitting
Tomography and Fourier Measures
Tomography and Fourier Measures
                                                    ˆ
                                                    f = FFT2(f )




                                                           k

Fourier slice theorem:    ˆ       ˆ
                          p (⇥) = f (⇥ cos( ), ⇥ sin( ))
                           1D          2D Fourier

                                             R
Partial Fourier measurements: {p k (t)}t
                                       0     k<K

    Equivalent to:            ˆ
                         f = {f [ ]}
Regularized Inversion
Noisy measurements:       ⇥                    ˆ
                                      , y[ ] = f0 [ ] + w[ ].
      Noise:     w[⇥]   N (0, ), white noise.
1
    regularization:
                     1                  ˆ
          f = argmin
           ⇥
                              |y[⇤]     f [⇤]|2 +        |⇥f, ⇥m ⇤|.
                  f  2                               m

             +                             f
         f




Disclaimer: this is not compressed sensing.
MRI Imaging
              From [Lutsig et al.]
MRI Reconstruction
   From [Lutsig et al.]
                                        randomization
Fourier sub-sampling pattern:




High resolution    Low resolution   Linear              Sparsity
Compressive Fourier Measurements


Sampling low frequencies helps.




            Pseudo inverse        Sparse wavelets
Structured Measurements
Gaussian matrices: intractable for large N .
Random partial orthogonal matrix:      {     } orthogonal basis.
     =(    )        where | | = P drawn uniformly at random.
Fast measurements: (e.g. Fourier basis)
                     ,   y[ ] = f, ⇥         ˆ
                                           = f[ ]
Structured Measurements
Gaussian matrices: intractable for large N .
Random partial orthogonal matrix:      {   } orthogonal basis.
     =(    )        where | | = P drawn uniformly at random.
Fast measurements: (e.g. Fourier basis)
                      ,                  ˆ
                           y[ ] = f, ⇥ = f [ ]
                              ⌅                      ⌅
Mutual incoherence:       µ = N max |⇥⇥ , m ⇤|    [1, N ]
                                  ,m
Structured Measurements
Gaussian matrices: intractable for large N .
Random partial orthogonal matrix:         {    } orthogonal basis.
     =(    )          where | | = P drawn uniformly at random.
Fast measurements: (e.g. Fourier basis)
                        ,                  ˆ
                             y[ ] = f, ⇥ = f [ ]
                                ⌅                              ⌅
Mutual incoherence:         µ = N max |⇥⇥ , m ⇤|          [1, N ]
                                    ,m




   Theorem: with high probability on ,
                     CP
         If M     2 log(N )4
                             , then 2M                2    1
                 µ
                                         [Rudelson, Vershynin, 2006]
               not universal: requires incoherence.
Overview

• Inverse Problems Regularization
• Sparse Synthesis Regularization
• Theoritical Recovery Guarantees
• Compressed Sensing
• RIP and Polytopes CS Theory
• Fourier Measurements
• Convex Optimization via Proximal Splitting
Convex Optimization
Setting: G : H     R ⇤ {+⇥}
     H: Hilbert space. Here: H = RN .

           Problem:   min G(x)
                      x H
Convex Optimization
Setting: G : H     R ⇤ {+⇥}
     H: Hilbert space. Here: H = RN .

           Problem:    min G(x)
                       x H

Class of functions:                            x         y
  Convex: G(tx + (1   t)y)   tG(x) + (1   t)G(y)   t   [0, 1]
Convex Optimization
Setting: G : H     R ⇤ {+⇥}
     H: Hilbert space. Here: H = RN .

           Problem:    min G(x)
                       x H

Class of functions:                              x         y
  Convex: G(tx + (1   t)y)   tG(x) + (1     t)G(y)   t   [0, 1]

  Lower semi-continuous:     lim inf G(x)   G(x0 )
                             x   x0

  Proper: {x ⇥ H  G(x) ⇤= + } = ⌅
                               ⇤
Convex Optimization
Setting: G : H     R ⇤ {+⇥}
     H: Hilbert space. Here: H = RN .

             Problem:     min G(x)
                          x H

Class of functions:                                 x         y
  Convex: G(tx + (1     t)y)    tG(x) + (1     t)G(y)   t   [0, 1]

  Lower semi-continuous:        lim inf G(x)   G(x0 )
                                x   x0

  Proper: {x ⇥ H  G(x) ⇤= + } = ⌅
                               ⇤

                                0 if x ⇥ C,
Indicator:        C (x)   =
                                +    otherwise.
   (C closed and convex)
Proximal Operators
Proximal operator of G:
                         1
      Prox G (x) = argmin ||x   z||2 + G(z)
                      z  2
Proximal Operators
Proximal operator of G:
                         1
      Prox G (x) = argmin ||x      z||2 + G(z)
                      z  2
                                          12              log(1 + x2 )
G(x) = ||x||1 =        |xi |              10
                                                           |x| ||x||0
                                           8


                   i                       6


                                           4


                                           2


                                           0




G(x) = ||x||0 = | {i  xi = 0} |          −2
                                                                                 G(x)
                                          −10   −8   −6    −4   −2   0   2   4    6   8   10




G(x) =        log(1 + |xi |2 )
          i
Proximal Operators
Proximal operator of G:
                         1
      Prox G (x) = argmin ||x               z||2 + G(z)
                      z  2
                                                       12                     log(1 + x2 )
G(x) = ||x||1 =              |xi |                     10
                                                                               |x| ||x||0
                                                        8


                         i
     Prox       G (x)i   = max 0, 1
                                                        6


                                                 xi     4


                                         |xi |          2


                                                        0




G(x) = ||x||0 = | {i  xi = 0} |                       −2
                                                                                                               G(x)
                                                       −10     −8        −6        −4        −2    0   2   4       6       8        10




                                 xi if |xi |     2 ,
                                                        10




     Prox       G (x)i   =
                                                         8




                                 0 otherwise.
                                                         6


                                                         4


                                                         2


                                                         0




G(x) =           log(1 + |xi |2 )                      −2


                                                       −4



            i                                          −6




            3rd order polynomial root.
                                                       −8
                                                                                                   ProxG (x)
                                                       −10
                                                             −10    −8        −6        −4    −2   0   2   4   6       8       10
Proximal Calculus
Separability:    G(x) = G1 (x1 ) + . . . + Gn (xn )
     ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
Proximal Calculus
Separability:    G(x) = G1 (x1 ) + . . . + Gn (xn )
    ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
                                     1
Quadratic functionals:  G(x) = || x y||2
                                     2
  Prox G = (Id +       ) 1
            =     (Id +       )   1
Proximal Calculus
Separability:     G(x) = G1 (x1 ) + . . . + Gn (xn )
    ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
                                     1
Quadratic functionals:  G(x) = || x y||2
                                     2
  Prox G = (Id +       ) 1
            =      (Id +       )   1


Composition by tight frame: A A = Id
      ProxG     A (x)   =A   ProxG A + Id        A     A
Proximal Calculus
Separability:       G(x) = G1 (x1 ) + . . . + Gn (xn )
    ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
                                     1
Quadratic functionals:  G(x) = || x y||2
                                     2
  Prox G = (Id +       ) 1
              =      (Id +              )   1


Composition by tight frame: A A = Id
      ProxG       A (x)   =A      ProxG A + Id     A         A
                                                                     x
Indicators:        G(x) =       C (x)
                                                         C
    Prox   G (x)   = ProjC (x)                               ProjC (x)
                   = argmin ||x             z||
                          z C
Gradient and Proximal Descents
Gradient descent:   x( +1) = x( )   G(x( ) )       [explicit]
           G is C 1 and G is L-Lipschitz

     Theorem:   If 0 <   < 2/L, x(   )
                                         x a solution.
Gradient and Proximal Descents
Gradient descent:   x( +1) = x( )   G(x( ) )                      [explicit]
           G is C 1 and G is L-Lipschitz

     Theorem:   If 0 <     < 2/L, x(        )
                                                x a solution.

Sub-gradient descent: x(   +1)
                                 = x(   )
                                                v( ) ,   v(   )
                                                                    G(x( ) )

     Theorem:   If       1/⇥, x(   )
                                            x a solution.

           Problem: slow.
Gradient and Proximal Descents
Gradient descent:   x( +1) = x( )   G(x( ) )                             [explicit]
           G is C 1 and G is L-Lipschitz

     Theorem:    If 0 <      < 2/L, x(           )
                                                       x a solution.

Sub-gradient descent: x(    +1)
                                  = x(       )
                                                       v( ) ,   v(   )
                                                                           G(x( ) )

     Theorem:    If        1/⇥, x(   )
                                                 x a solution.

           Problem: slow.
Proximal-point algorithm: x(⇥+1) = Prox                   G (x(⇥) ) [implicit]

     Theorem:    If        c > 0, x(     )
                                                     x a solution.

                Prox   G   hard to compute.
Proximal Splitting Methods
           Solve     min E(x)
                     x H
Problem:      Prox   E   is not available.
Proximal Splitting Methods
           Solve     min E(x)
                     x H
Problem:      Prox   E   is not available.
Splitting:    E(x) = F (x) +            Gi (x)
                                    i
                         Smooth         Simple
Proximal Splitting Methods
           Solve     min E(x)
                     x H
Problem:      Prox   E   is not available.
Splitting:    E(x) = F (x) +            Gi (x)
                                    i
                         Smooth         Simple
                                         F (x)
Iterative algorithms using:
                                        Prox Gi (x)
                               solves
   Forward-Backward:                         F + G
   Douglas-Rachford:                                  Gi
   Primal-Dual:                                       Gi A
   Generalized FB:                           F+       Gi
Smooth + Simple Splitting
Inverse problem:    measurements      y = Kf0 + w
    f0                Kf0
                K                     K : RN   RP ,   P   N


Model: f0 =     x0 sparse in dictionary    .
Sparse recovery: f =    x where x solves
              min F (x) + G(x)
             x RN
                Smooth Simple
                        1
Data fidelity:   F (x) = ||y     x||2           =K ⇥
                        2
Regularization: G(x) = ||x||1 =    |xi |
                                  i
Forward-Backward
Fix point equation:
   x    argmin F (x) + G(x)        0       F (x ) + G(x )
           x
                      (x      F (x ))      x + ⇥G(x )
                       x⇥ = Prox   G (x⇥       F (x⇥ ))
Forward-Backward
Fix point equation:
   x    argmin F (x) + G(x)           0       F (x ) + G(x )
           x
                        (x       F (x ))      x + ⇥G(x )
                        x⇥ = Prox     G (x⇥       F (x⇥ ))

Forward-backward:     x(⇥+1) = Prox   G   x(⇥)       F (x(⇥) )
Forward-Backward
Fix point equation:
   x    argmin F (x) + G(x)           0         F (x ) + G(x )
           x
                        (x       F (x ))         x + ⇥G(x )
                         x⇥ = Prox    G (x⇥          F (x⇥ ))

Forward-backward:     x(⇥+1) = Prox   G       x(⇥)      F (x(⇥) )

Projected gradient descent:    G=         C
Forward-Backward
Fix point equation:
   x     argmin F (x) + G(x)               0         F (x ) + G(x )
             x
                           (x         F (x ))         x + ⇥G(x )
                               x⇥ = Prox   G (x⇥          F (x⇥ ))

Forward-backward:     x(⇥+1) = Prox        G       x(⇥)      F (x(⇥) )

Projected gradient descent:           G=       C


       Theorem:       Let       F be L-Lipschitz.
       If    < 2/L,   x(   )
                                  x   a solution of ( )
Example: L1 Regularization
    1
 min || x    y||2 + ||x||1             min F (x) + G(x)
  x 2                                    x


            1
     F (x) = || x      y||2
            2
             F (x) =        ( x   y)                 L = ||   ||

     G(x) = ||x||1
                                               ⇥
            Prox   G (x)i   = max 0, 1                 xi
                                             |xi |


Forward-backward                  Iterative soft thresholding
Douglas Rachford Scheme

                  min G1 (x) + G2 (x)              ( )
                   x
                        Simple         Simple
Douglas-Rachford iterations:

  z (⇥+1) = 1           z (⇥) +       RProx   G2   RProx   G1 (z (⇥) )
                  2               2
  x(⇥+1) = Prox   G2 (z (⇥+1) )

Reflexive prox:
          RProx       G (x)   = 2Prox    G (x)     x
Douglas Rachford Scheme

                  min G1 (x) + G2 (x)                 ( )
                   x
                           Simple         Simple
Douglas-Rachford iterations:

  z (⇥+1) = 1              z (⇥) +       RProx   G2   RProx    G1 (z (⇥) )
                  2                  2
  x(⇥+1) = Prox   G2 (z (⇥+1) )

Reflexive prox:
          RProx       G (x)   = 2Prox       G (x)     x

       Theorem:        If 0 <        < 2 and ⇥ > 0,
                x(     )
                              x            a solution of ( )
Example: Constrainted L1
                  min ||x||1                min G1 (x) + G2 (x)
                  x=y                        x

G1 (x) = iC (x),         C = {x  x = y}
   Prox   G1 (x) = ProjC (x) = x +
                                                 ⇥
                                                     (   ⇥
                                                             )   1
                                                                     (y      x)

G2 (x) = ||x||1       Prox     G2 (x)   =    max 0, 1                        xi
                                                                     |xi |        i
          e⇥cient if            easy to invert.
Example: Constrainted L1
                  min ||x||1                min G1 (x) + G2 (x)
                  x=y                        x

G1 (x) = iC (x),         C = {x  x = y}
   Prox   G1 (x) = ProjC (x) = x +
                                                 ⇥
                                                     (       ⇥
                                                                 )     1
                                                                           (y            x)

G2 (x) = ||x||1       Prox     G2 (x)   =    max 0, 1                                    xi
                                                                           |xi |                i
          e⇥cient if            easy to invert.                      log10 (||x( ) ||1          ||x ||1 )
                                                         1

Example: compressed sensing                          −1
                                                         0



       R100   400
                      Gaussian matrix                −2
                                                     −3      = 0.01
  y = x0                ||x0 ||0 = 17                −4      =1
                                                     −5
                                                             = 10
                                                                 50        100     150        200   250
More than 2 Functionals

      min G1 (x) + . . . + Gk (x)                 each Fi is simple
         x

    min G(x1 , . . . , xk ) +   C (x1 , . . . , xk )
     x

G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk )

C = (x1 , . . . , xk )   Hk  x1 = . . . = xk
More than 2 Functionals

            min G1 (x) + . . . + Gk (x)                    each Fi is simple
             x

        min G(x1 , . . . , xk ) +        C (x1 , . . . , xk )
         x

   G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk )

   C = (x1 , . . . , xk )          Hk  x1 = . . . = xk


G and   C   are simple:

 Prox   G (x1 , . . . , xk )    = (Prox     Gi (xi ))i
                                                                 1
 Prox   ⇥C (x1 , . . . , xk )   = (˜, . . . , x)
                                   x          ˜        where x =
                                                             ˜              xi
                                                                 k      i
Auxiliary Variables
min G1 (x) + G2 A(x)              Linear map A : E   H.
  x
 min G(z) +    C (z)              G1 , G2 simple.
z⇥H E

      G(x, y) = G1 (x) + G2 (y)
      C = {(x, y) ⇥ H   E  Ax = y}
Auxiliary Variables
        min G1 (x) + G2 A(x)                Linear map A : E   H.
          x
        min G(z) +      C (z)               G1 , G2 simple.
       z⇥H E

              G(x, y) = G1 (x) + G2 (y)
              C = {(x, y) ⇥ H       E  Ax = y}

Prox   G (x, y)   = (Prox   G1 (x), Prox G2 (y))

Prox C (x, y) = (x + A y , y
                       ˜          y ) = (˜, A˜)
                                  ˜      x x

                    y = (Id + AA )
                    ˜                 1
                                          (Ax   y)
       where
                   x = (Id + A A)
                   ˜                  1
                                          (A y + x)
       e cient if Id + AA or Id + A A easy to invert.
Example: TV Regularization
          1                                   ||u||1 =        ||ui ||
      min ||Kf y||2 + ||⇥f ||1
       f  2                                              i
      min G1 (f ) + G2 (f )
        x

G1 (u) = ||u||1      Prox   G1 (u)i    = max 0, 1                      ui
                                                         ||ui ||
         1
G2 (f ) = ||Kf     y||2         Prox        = (Id + K K)           1
                                                                       K
         2                             G2


C = (f, u) ⇥ RN       RN    2
                                 u = ⇤f
                         ˜ ˜
        Prox C (f, u) = (f , f )
Example: TV Regularization
          1                                    ||u||1 =        ||ui ||
      min ||Kf y||2 + ||⇥f ||1
       f  2                                               i
      min G1 (f ) + G2 (f )
        x

G1 (u) = ||u||1       Prox   G1 (u)i    = max 0, 1                      ui
                                                          ||ui ||
         1
G2 (f ) = ||Kf      y||2         Prox        = (Id + K K)           1
                                                                        K
         2                              G2


C = (f, u) ⇥ RN        RN    2
                                  u = ⇤f
                         ˜ ˜
        Prox C (f, u) = (f , f )
Compute the solution of:           (Id +       ˜
                                              )f =   div(u) + f
            O(N log(N )) operations using FFT.
Example: TV Regularization




  Orignal f0       y = f0 + w     Recovery f




y = Kx0                                Iteration
Conclusion
Sparsity: approximate signals with few atoms.


         dictionary
Conclusion
 Sparsity: approximate signals with few atoms.


          dictionary



Compressed sensing ideas:
      Randomized sensors + sparse recovery.
      Number of measurements signal complexity.
      CS is about designing new hardware.
Conclusion
 Sparsity: approximate signals with few atoms.


           dictionary



Compressed sensing ideas:
       Randomized sensors + sparse recovery.
       Number of measurements signal complexity.
       CS is about designing new hardware.
The devil is in the constants:
       Worse case analysis is problematic.
       Designing good signal models.
RAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 IT
    CALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8




                                                                                                                                                                                               EPRESENTATION FOR COLOR IMAGE RESTORATION
     DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR
     ESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3                                                                                                                         Some Hot Topics
color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed (             in the new metric).
uced with our proposed technique (              in our proposed new metric). Both images have been denoised with the same global dictionary.
 bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when
h is another artifact our approach corrected. (a) Original. (b) Original algorithm,                                dB. (c) Proposed algorithm,



                                  Dictionary learning:
         dB.
with 256 atoms learned on a generic database of natural images, with two different sizes ofREPRESENTATION FOR COLOR IMAGE RESTORATION
                                                                                                                     MAIRAL et al.: SPARSE patches. Note the large number of color-less                                                                                                                                            atoms.                                                                     57
ave negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5                                                                                                                                                                                5     3 patches; (b) 8           8     3 patches.


R IMAGE RESTORATION                                                                                                                                                                                                                                                                                                                                               61
                                                                                                                           Fig. 7. Data set used for evaluating denoising experiments.




                                                                                                                                                                                             learning

ing Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.
                                                                                                                 TABLE I




g. 7. Data set used for evaluating denoising experiments. with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.
                                                  Fig. 2. Dictionaries
                                                                                                                                                                                                                                           Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5   5   3 patches; (b) 8   8   3 patches.




 color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed (      in the new metric).
duced with our proposed technique (
                                   TABLE I our proposed new metric). Both images have been denoised with the same global dictionary.
                                              in
TH 256 ATOMS OF SIZE castle 7 in3 FOR of the water. What is more, the color of the sky is.piecewise CASE IS DIVIDED IN FOUR
a bias effect in the color from the 7     and some part            AND 6 6 3 FOR                           EACH constant when
ch is another artifact our approach corrected. (a)HEIR “3(b) Original algorithm, HE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY
 Y MCAULEY AND AL [28] WITH T                     Original.      3 MODEL.” T                                 dB. (c) Proposed algorithm,
                                                                 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE O




          dB.
                                                                  8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS




2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINED
                                                                   AND 6




OTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.
H GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS
                                                                          6 3 FOR




                                                                                                                                                                                                                                           Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed (               in the new metric).
                                                                                                                                                                                                                                           Color artifacts are reduced with our proposed technique (             in our proposed new metric). Both images have been denoised with the same global dictionary.
                                                                                                                                                                                                                                           In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when
                                                                                                                                                                                                                                           (false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm,                                dB. (c) Proposed algorithm,
                                                                                                                                                                                                                                                                          dB.
                                                                                          . EACH CASE IS DIVID
RAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 IT
    CALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8




                                                                                                                                                                                                  EPRESENTATION FOR COLOR IMAGE RESTORATION
     DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR
     ESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3                                                                                                                         Some Hot Topics
color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed (             in the new metric).
uced with our proposed technique (              in our proposed new metric). Both images have been denoised with the same global dictionary.
 bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when


                                                                                                                                                                                                                                                                                                                                                          Image f =
h is another artifact our approach corrected. (a) Original. (b) Original algorithm,                                dB. (c) Proposed algorithm,
         dB.


                                  Dictionary learning:
with 256 atoms learned on a generic database of natural images, with two different sizes ofREPRESENTATION FOR COLOR IMAGE RESTORATION
ave negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5
                                                                                                                     MAIRAL et al.: SPARSE patches. Note the large number of color-less
                                                                                                                                                                                                                                                                                            5     3 patches; (b) 8           8
                                                                                                                                                                                                                                                                                                                                      atoms.
                                                                                                                                                                                                                                                                                                                                   3 patches.
                                                                                                                                                                                                                                                                                                                                                                                                                 57
                                                                                                                                                                                                                                                                                                                                                                                                                       x
R IMAGE RESTORATION                                                                                                                                                                                                                                                                                                                                                  61
                                                                                                                           Fig. 7. Data set used for evaluating denoising experiments.




                                                                                                                                                                                             learning

ing Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.
                                                                                                                 TABLE I




                                            Analysis vs. synthesis:
g. 7. Data set used for evaluating denoising experiments. with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.
                                                  Fig. 2. Dictionaries
                                                                                                                                                                                                                                              Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5   5   3 patches; (b) 8   8   3 patches.



                                               Js (f ) = min ||x||1
 color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed (
                                                   TABLE I                                                            in the new metric).
duced with our proposed technique (
a bias effect in the color from the 7
                                              in our proposed new metric). Both images have been denoised with the same global dictionary.
TH 256 ATOMS OF SIZE castle 7 in3 FOR of the water. What is more, the color of the sky is.piecewise CASE IS DIVIDED IN FOUR
                                          and some part            AND 6 6 3 FOR                           EACH constant when
                                                                                                                                                                                           f= x
ch is another artifact our approach corrected. (a)HEIR “3(b) Original algorithm, HE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY
 Y MCAULEY AND AL [28] WITH T                     Original.      3 MODEL.” T                                 dB. (c) Proposed algorithm,
                                                                 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE O




          dB.
                                                                  8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS




2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINED
                                                                   AND 6




OTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.
H GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS

                                                                                                                                                                                                                                                                                                                 Coe cients x
                                                                          6 3 FOR




                                                                                                                                                                                                                                              Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed (               in the new metric).
                                                                                                                                                                                                                                              Color artifacts are reduced with our proposed technique (             in our proposed new metric). Both images have been denoised with the same global dictionary.
                                                                                                                                                                                                                                              In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when
                                                                                                                                                                                                                                              (false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm,                                dB. (c) Proposed algorithm,
                                                                                                                                                                                                                                                                             dB.
                                                                                          . EACH CASE IS DIVID
Sparsity and Compressed Sensing
Sparsity and Compressed Sensing
Sparsity and Compressed Sensing
Sparsity and Compressed Sensing

More Related Content

What's hot

Dsp U Lec07 Realization Of Discrete Time Systems
Dsp U   Lec07 Realization Of Discrete Time SystemsDsp U   Lec07 Realization Of Discrete Time Systems
Dsp U Lec07 Realization Of Discrete Time Systemstaha25
 
Bouguet's MatLab Camera Calibration Toolbox
Bouguet's MatLab Camera Calibration ToolboxBouguet's MatLab Camera Calibration Toolbox
Bouguet's MatLab Camera Calibration ToolboxYuji Oyamada
 
Bouguet's MatLab Camera Calibration Toolbox for Stereo Camera
Bouguet's MatLab Camera Calibration Toolbox for Stereo CameraBouguet's MatLab Camera Calibration Toolbox for Stereo Camera
Bouguet's MatLab Camera Calibration Toolbox for Stereo CameraYuji Oyamada
 
Dsp U Lec08 Fir Filter Design
Dsp U   Lec08 Fir Filter DesignDsp U   Lec08 Fir Filter Design
Dsp U Lec08 Fir Filter Designtaha25
 
An evaluation of gnss code and phase solutions
An evaluation of gnss code and phase solutionsAn evaluation of gnss code and phase solutions
An evaluation of gnss code and phase solutionsAlexander Decker
 
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Jia-Bin Huang
 
Camera calibration
Camera calibrationCamera calibration
Camera calibrationYuji Oyamada
 
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)Jia-Bin Huang
 
DISTINGUISH BETWEEN WALSH TRANSFORM AND HAAR TRANSFORMDip transforms
DISTINGUISH BETWEEN WALSH TRANSFORM AND HAAR TRANSFORMDip transformsDISTINGUISH BETWEEN WALSH TRANSFORM AND HAAR TRANSFORMDip transforms
DISTINGUISH BETWEEN WALSH TRANSFORM AND HAAR TRANSFORMDip transformsNITHIN KALLE PALLY
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
A Review of Proximal Methods, with a New One
A Review of Proximal Methods, with a New OneA Review of Proximal Methods, with a New One
A Review of Proximal Methods, with a New OneGabriel Peyré
 
Tele3113 wk11tue
Tele3113 wk11tueTele3113 wk11tue
Tele3113 wk11tueVin Voro
 
Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Jia-Bin Huang
 
SPU Optimizations-part 1
SPU Optimizations-part 1SPU Optimizations-part 1
SPU Optimizations-part 1Naughty Dog
 
SPU Optimizations - Part 2
SPU Optimizations - Part 2SPU Optimizations - Part 2
SPU Optimizations - Part 2Naughty Dog
 
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)Joo-Haeng Lee
 
Ultrasound Modular Architecture
Ultrasound Modular ArchitectureUltrasound Modular Architecture
Ultrasound Modular ArchitectureJose Miguel Moreno
 
Quantum Probabilities and Quantum-inspired Information Retrieval
Quantum Probabilities and Quantum-inspired Information RetrievalQuantum Probabilities and Quantum-inspired Information Retrieval
Quantum Probabilities and Quantum-inspired Information RetrievalIngo Frommholz
 
Practical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsPractical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsNaughty Dog
 

What's hot (20)

Dsp U Lec07 Realization Of Discrete Time Systems
Dsp U   Lec07 Realization Of Discrete Time SystemsDsp U   Lec07 Realization Of Discrete Time Systems
Dsp U Lec07 Realization Of Discrete Time Systems
 
Bouguet's MatLab Camera Calibration Toolbox
Bouguet's MatLab Camera Calibration ToolboxBouguet's MatLab Camera Calibration Toolbox
Bouguet's MatLab Camera Calibration Toolbox
 
Bouguet's MatLab Camera Calibration Toolbox for Stereo Camera
Bouguet's MatLab Camera Calibration Toolbox for Stereo CameraBouguet's MatLab Camera Calibration Toolbox for Stereo Camera
Bouguet's MatLab Camera Calibration Toolbox for Stereo Camera
 
Dsp U Lec08 Fir Filter Design
Dsp U   Lec08 Fir Filter DesignDsp U   Lec08 Fir Filter Design
Dsp U Lec08 Fir Filter Design
 
An evaluation of gnss code and phase solutions
An evaluation of gnss code and phase solutionsAn evaluation of gnss code and phase solutions
An evaluation of gnss code and phase solutions
 
Deblurring in ct
Deblurring in ctDeblurring in ct
Deblurring in ct
 
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
 
Camera calibration
Camera calibrationCamera calibration
Camera calibration
 
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
 
DISTINGUISH BETWEEN WALSH TRANSFORM AND HAAR TRANSFORMDip transforms
DISTINGUISH BETWEEN WALSH TRANSFORM AND HAAR TRANSFORMDip transformsDISTINGUISH BETWEEN WALSH TRANSFORM AND HAAR TRANSFORMDip transforms
DISTINGUISH BETWEEN WALSH TRANSFORM AND HAAR TRANSFORMDip transforms
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
A Review of Proximal Methods, with a New One
A Review of Proximal Methods, with a New OneA Review of Proximal Methods, with a New One
A Review of Proximal Methods, with a New One
 
Tele3113 wk11tue
Tele3113 wk11tueTele3113 wk11tue
Tele3113 wk11tue
 
Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)
 
SPU Optimizations-part 1
SPU Optimizations-part 1SPU Optimizations-part 1
SPU Optimizations-part 1
 
SPU Optimizations - Part 2
SPU Optimizations - Part 2SPU Optimizations - Part 2
SPU Optimizations - Part 2
 
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
 
Ultrasound Modular Architecture
Ultrasound Modular ArchitectureUltrasound Modular Architecture
Ultrasound Modular Architecture
 
Quantum Probabilities and Quantum-inspired Information Retrieval
Quantum Probabilities and Quantum-inspired Information RetrievalQuantum Probabilities and Quantum-inspired Information Retrieval
Quantum Probabilities and Quantum-inspired Information Retrieval
 
Practical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsPractical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT Methods
 

Viewers also liked

Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Varad Meru
 
ESSIR 2013 Recommender Systems tutorial
ESSIR 2013 Recommender Systems tutorial ESSIR 2013 Recommender Systems tutorial
ESSIR 2013 Recommender Systems tutorial Alexandros Karatzoglou
 
Matrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionMatrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionActiveEon
 
Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsNavisro Analytics
 
Graph Based Recommendation Systems at eBay
Graph Based Recommendation Systems at eBayGraph Based Recommendation Systems at eBay
Graph Based Recommendation Systems at eBayDataStax Academy
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Recommendation system
Recommendation system Recommendation system
Recommendation system Vikrant Arya
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
 

Viewers also liked (10)

Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
 
Compressed Sensing In Spectral Imaging
Compressed Sensing In Spectral Imaging  Compressed Sensing In Spectral Imaging
Compressed Sensing In Spectral Imaging
 
ESSIR 2013 Recommender Systems tutorial
ESSIR 2013 Recommender Systems tutorial ESSIR 2013 Recommender Systems tutorial
ESSIR 2013 Recommender Systems tutorial
 
Matrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionMatrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer Vision
 
Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro Analytics
 
Graph Based Recommendation Systems at eBay
Graph Based Recommendation Systems at eBayGraph Based Recommendation Systems at eBay
Graph Based Recommendation Systems at eBay
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
reveal.js 3.0.0
reveal.js 3.0.0reveal.js 3.0.0
reveal.js 3.0.0
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 

Similar to Sparsity and Compressed Sensing

02 2d systems matrix
02 2d systems matrix02 2d systems matrix
02 2d systems matrixRumah Belajar
 
Low Complexity Regularization of Inverse Problems - Course #1 Inverse Problems
Low Complexity Regularization of Inverse Problems - Course #1 Inverse ProblemsLow Complexity Regularization of Inverse Problems - Course #1 Inverse Problems
Low Complexity Regularization of Inverse Problems - Course #1 Inverse ProblemsGabriel Peyré
 
Signal Processing Course : Orthogonal Bases
Signal Processing Course : Orthogonal BasesSignal Processing Course : Orthogonal Bases
Signal Processing Course : Orthogonal BasesGabriel Peyré
 
NMR Spectroscopy
NMR SpectroscopyNMR Spectroscopy
NMR Spectroscopyclayqn88
 
Theoretical Spectroscopy Lectures: real-time approach 2
Theoretical Spectroscopy Lectures: real-time approach 2Theoretical Spectroscopy Lectures: real-time approach 2
Theoretical Spectroscopy Lectures: real-time approach 2Claudio Attaccalite
 
Decimation in time and frequency
Decimation in time and frequencyDecimation in time and frequency
Decimation in time and frequencySARITHA REDDY
 
Analisis Korespondensi
Analisis KorespondensiAnalisis Korespondensi
Analisis Korespondensidessybudiyanti
 
Dsp U Lec10 DFT And FFT
Dsp U   Lec10  DFT And  FFTDsp U   Lec10  DFT And  FFT
Dsp U Lec10 DFT And FFTtaha25
 
Schrodinger equation in QM Reminders.ppt
Schrodinger equation in QM Reminders.pptSchrodinger equation in QM Reminders.ppt
Schrodinger equation in QM Reminders.pptRakeshPatil2528
 
Chapter 9 computation of the dft
Chapter 9 computation of the dftChapter 9 computation of the dft
Chapter 9 computation of the dftmikeproud
 
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...Alberto Maspero
 
On gradient Ricci solitons
On gradient Ricci solitonsOn gradient Ricci solitons
On gradient Ricci solitonsmnlfdzlpz
 
Signal Processing Course : Inverse Problems Regularization
Signal Processing Course : Inverse Problems RegularizationSignal Processing Course : Inverse Problems Regularization
Signal Processing Course : Inverse Problems RegularizationGabriel Peyré
 

Similar to Sparsity and Compressed Sensing (20)

03 image transform
03 image transform03 image transform
03 image transform
 
02 2d systems matrix
02 2d systems matrix02 2d systems matrix
02 2d systems matrix
 
Fourier transform
Fourier transformFourier transform
Fourier transform
 
Low Complexity Regularization of Inverse Problems - Course #1 Inverse Problems
Low Complexity Regularization of Inverse Problems - Course #1 Inverse ProblemsLow Complexity Regularization of Inverse Problems - Course #1 Inverse Problems
Low Complexity Regularization of Inverse Problems - Course #1 Inverse Problems
 
Signal Processing Course : Orthogonal Bases
Signal Processing Course : Orthogonal BasesSignal Processing Course : Orthogonal Bases
Signal Processing Course : Orthogonal Bases
 
NMR Spectroscopy
NMR SpectroscopyNMR Spectroscopy
NMR Spectroscopy
 
Theoretical Spectroscopy Lectures: real-time approach 2
Theoretical Spectroscopy Lectures: real-time approach 2Theoretical Spectroscopy Lectures: real-time approach 2
Theoretical Spectroscopy Lectures: real-time approach 2
 
Decimation in time and frequency
Decimation in time and frequencyDecimation in time and frequency
Decimation in time and frequency
 
Analisis Korespondensi
Analisis KorespondensiAnalisis Korespondensi
Analisis Korespondensi
 
Mcgill3
Mcgill3Mcgill3
Mcgill3
 
Dsp U Lec10 DFT And FFT
Dsp U   Lec10  DFT And  FFTDsp U   Lec10  DFT And  FFT
Dsp U Lec10 DFT And FFT
 
Schrodinger equation in QM Reminders.ppt
Schrodinger equation in QM Reminders.pptSchrodinger equation in QM Reminders.ppt
Schrodinger equation in QM Reminders.ppt
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
 
Linear response theory
Linear response theoryLinear response theory
Linear response theory
 
Chapter 9 computation of the dft
Chapter 9 computation of the dftChapter 9 computation of the dft
Chapter 9 computation of the dft
 
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...
 
Dft
DftDft
Dft
 
On gradient Ricci solitons
On gradient Ricci solitonsOn gradient Ricci solitons
On gradient Ricci solitons
 
Quantum chaos of generic systems - Marko Robnik
Quantum chaos of generic systems - Marko RobnikQuantum chaos of generic systems - Marko Robnik
Quantum chaos of generic systems - Marko Robnik
 
Signal Processing Course : Inverse Problems Regularization
Signal Processing Course : Inverse Problems RegularizationSignal Processing Course : Inverse Problems Regularization
Signal Processing Course : Inverse Problems Regularization
 

More from Gabriel Peyré

Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Gabriel Peyré
 
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Gabriel Peyré
 
Low Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse ProblemsLow Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse ProblemsGabriel Peyré
 
Model Selection with Piecewise Regular Gauges
Model Selection with Piecewise Regular GaugesModel Selection with Piecewise Regular Gauges
Model Selection with Piecewise Regular GaugesGabriel Peyré
 
Proximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportProximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportGabriel Peyré
 
Geodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and GraphicsGeodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and GraphicsGabriel Peyré
 
Learning Sparse Representation
Learning Sparse RepresentationLearning Sparse Representation
Learning Sparse RepresentationGabriel Peyré
 
Adaptive Signal and Image Processing
Adaptive Signal and Image ProcessingAdaptive Signal and Image Processing
Adaptive Signal and Image ProcessingGabriel Peyré
 
Mesh Processing Course : Mesh Parameterization
Mesh Processing Course : Mesh ParameterizationMesh Processing Course : Mesh Parameterization
Mesh Processing Course : Mesh ParameterizationGabriel Peyré
 
Mesh Processing Course : Multiresolution
Mesh Processing Course : MultiresolutionMesh Processing Course : Multiresolution
Mesh Processing Course : MultiresolutionGabriel Peyré
 
Mesh Processing Course : Introduction
Mesh Processing Course : IntroductionMesh Processing Course : Introduction
Mesh Processing Course : IntroductionGabriel Peyré
 
Mesh Processing Course : Geodesics
Mesh Processing Course : GeodesicsMesh Processing Course : Geodesics
Mesh Processing Course : GeodesicsGabriel Peyré
 
Mesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic SamplingMesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic SamplingGabriel Peyré
 
Mesh Processing Course : Differential Calculus
Mesh Processing Course : Differential CalculusMesh Processing Course : Differential Calculus
Mesh Processing Course : Differential CalculusGabriel Peyré
 
Mesh Processing Course : Active Contours
Mesh Processing Course : Active ContoursMesh Processing Course : Active Contours
Mesh Processing Course : Active ContoursGabriel Peyré
 
Signal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Theory for Sparse RecoverySignal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Theory for Sparse RecoveryGabriel Peyré
 
Signal Processing Course : Presentation of the Course
Signal Processing Course : Presentation of the CourseSignal Processing Course : Presentation of the Course
Signal Processing Course : Presentation of the CourseGabriel Peyré
 
Signal Processing Course : Fourier
Signal Processing Course : FourierSignal Processing Course : Fourier
Signal Processing Course : FourierGabriel Peyré
 
Signal Processing Course : Denoising
Signal Processing Course : DenoisingSignal Processing Course : Denoising
Signal Processing Course : DenoisingGabriel Peyré
 
Signal Processing Course : Convex Optimization
Signal Processing Course : Convex OptimizationSignal Processing Course : Convex Optimization
Signal Processing Course : Convex OptimizationGabriel Peyré
 

More from Gabriel Peyré (20)

Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
 
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
 
Low Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse ProblemsLow Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse Problems
 
Model Selection with Piecewise Regular Gauges
Model Selection with Piecewise Regular GaugesModel Selection with Piecewise Regular Gauges
Model Selection with Piecewise Regular Gauges
 
Proximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportProximal Splitting and Optimal Transport
Proximal Splitting and Optimal Transport
 
Geodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and GraphicsGeodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and Graphics
 
Learning Sparse Representation
Learning Sparse RepresentationLearning Sparse Representation
Learning Sparse Representation
 
Adaptive Signal and Image Processing
Adaptive Signal and Image ProcessingAdaptive Signal and Image Processing
Adaptive Signal and Image Processing
 
Mesh Processing Course : Mesh Parameterization
Mesh Processing Course : Mesh ParameterizationMesh Processing Course : Mesh Parameterization
Mesh Processing Course : Mesh Parameterization
 
Mesh Processing Course : Multiresolution
Mesh Processing Course : MultiresolutionMesh Processing Course : Multiresolution
Mesh Processing Course : Multiresolution
 
Mesh Processing Course : Introduction
Mesh Processing Course : IntroductionMesh Processing Course : Introduction
Mesh Processing Course : Introduction
 
Mesh Processing Course : Geodesics
Mesh Processing Course : GeodesicsMesh Processing Course : Geodesics
Mesh Processing Course : Geodesics
 
Mesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic SamplingMesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic Sampling
 
Mesh Processing Course : Differential Calculus
Mesh Processing Course : Differential CalculusMesh Processing Course : Differential Calculus
Mesh Processing Course : Differential Calculus
 
Mesh Processing Course : Active Contours
Mesh Processing Course : Active ContoursMesh Processing Course : Active Contours
Mesh Processing Course : Active Contours
 
Signal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Theory for Sparse RecoverySignal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Theory for Sparse Recovery
 
Signal Processing Course : Presentation of the Course
Signal Processing Course : Presentation of the CourseSignal Processing Course : Presentation of the Course
Signal Processing Course : Presentation of the Course
 
Signal Processing Course : Fourier
Signal Processing Course : FourierSignal Processing Course : Fourier
Signal Processing Course : Fourier
 
Signal Processing Course : Denoising
Signal Processing Course : DenoisingSignal Processing Course : Denoising
Signal Processing Course : Denoising
 
Signal Processing Course : Convex Optimization
Signal Processing Course : Convex OptimizationSignal Processing Course : Convex Optimization
Signal Processing Course : Convex Optimization
 

Sparsity and Compressed Sensing

  • 1. Sparsity and Compressed Sensing Gabriel Peyré www.numerical-tours.com
  • 2. Overview • Inverse Problems Regularization • Sparse Synthesis Regularization • Theoritical Recovery Guarantees • Compressed Sensing • RIP and Polytopes CS Theory • Fourier Measurements • Convex Optimization via Proximal Splitting
  • 3. Inverse Problems Forward model: y = K f0 + w RP Observations Operator (Unknown) Noise : RQ RP Input
  • 4. Inverse Problems Forward model: y = K f0 + w RP Observations Operator (Unknown) Noise : RQ RP Input Denoising: K = IdQ , P = Q.
  • 5. Inverse Problems Forward model: y = K f0 + w RP Observations Operator (Unknown) Noise : RQ RP Input Denoising: K = IdQ , P = Q. Inpainting: set of missing pixels, P = Q | |. 0 if x , (Kf )(x) = f (x) if x / . K
  • 6. Inverse Problems Forward model: y = K f0 + w RP Observations Operator (Unknown) Noise : RQ RP Input Denoising: K = IdQ , P = Q. Inpainting: set of missing pixels, P = Q | |. 0 if x , (Kf )(x) = f (x) if x / . Super-resolution: Kf = (f k) , P = Q/ . K K
  • 7. Inverse Problem in Medical Imaging Kf = (p k )1 k K
  • 8. Inverse Problem in Medical Imaging Kf = (p k )1 k K Magnetic resonance imaging (MRI): ˆ Kf = (f ( )) ˆ f
  • 9. Inverse Problem in Medical Imaging Kf = (p k )1 k K Magnetic resonance imaging (MRI): ˆ Kf = (f ( )) ˆ f Other examples: MEG, EEG, . . .
  • 10. Inverse Problem Regularization Noisy measurements: y = Kf0 + w. Prior model: J : RQ R assigns a score to images. 1 f argmin ||y Kf ||2 + J(f ) f RQ 2
  • 11. Inverse Problem Regularization Noisy measurements: y = Kf0 + w. Prior model: J : RQ R assigns a score to images. 1 f argmin ||y Kf ||2 + J(f ) f RQ 2 Data fidelity Regularity
  • 12. Inverse Problem Regularization Noisy measurements: y = Kf0 + w. Prior model: J : RQ R assigns a score to images. 1 f argmin ||y Kf ||2 + J(f ) f RQ 2 Data fidelity Regularity Choice of : tradeo Noise level Regularity of f0 ||w|| J(f0 )
  • 13. Inverse Problem Regularization Noisy measurements: y = Kf0 + w. Prior model: J : RQ R assigns a score to images. 1 f argmin ||y Kf ||2 + J(f ) f RQ 2 Data fidelity Regularity Choice of : tradeo Noise level Regularity of f0 ||w|| J(f0 ) No noise: 0+ , minimize f argmin J(f ) f RQ ,Kf =y
  • 14. Smooth and Cartoon Priors J(f ) = || f (x)||2 dx | f |2
  • 15. Smooth and Cartoon Priors J(f ) = || f (x)||2 dx J(f ) = || f (x)||dx J(f ) = length(Ct )dt R | f |2 | f|
  • 16. Inpainting Example Input y = Kf0 + w Sobolev Total variation
  • 17. Overview • Inverse Problems Regularization • Sparse Synthesis Regularization • Theoritical Recovery Guarantees • Compressed Sensing • RIP and Polytopes CS Theory • Fourier Measurements • Convex Optimization via Proximal Splitting
  • 18. Redundant Dictionaries Dictionary =( m )m RQ N ,N Q. Q N
  • 19. Redundant Dictionaries Dictionary =( m )m RQ N ,N Q. Fourier: m = ei ·, m frequency Q N
  • 20. Redundant Dictionaries Dictionary =( m )m RQ N ,N Q. m = (j, , n) Fourier: m =e i ·, m frequency scale position Wavelets: m = (2 j R x n) orientation =1 =2 Q N
  • 21. Redundant Dictionaries Dictionary =( m )m RQ N ,N Q. m = (j, , n) Fourier: m =e i ·, m frequency scale position Wavelets: m = (2 j R x n) orientation DCT, Curvelets, bandlets, . . . =1 =2 Q N
  • 22. Redundant Dictionaries Dictionary =( m )m RQ N ,N Q. m = (j, , n) Fourier: m =e i ·, m frequency scale position Wavelets: m = (2 j R x n) orientation DCT, Curvelets, bandlets, . . . Synthesis: f = m xm m = x. =1 =2 Q =f x N Coe cients x Image f = x
  • 23. Sparse Priors Coe cients x Ideal sparsity: for most m, xm = 0. J0 (x) = # {m xm = 0} Image f0
  • 24. Sparse Priors Coe cients x Ideal sparsity: for most m, xm = 0. J0 (x) = # {m xm = 0} Sparse approximation: f = x where argmin ||f0 x||2 + T J0 (x) x RN Image f0
  • 25. Sparse Priors Coe cients x Ideal sparsity: for most m, xm = 0. J0 (x) = # {m xm = 0} Sparse approximation: f = x where argmin ||f0 x||2 + T J0 (x) x RN Orthogonal : = = IdN f0 , m if | f0 , m | > T, xm = 0 otherwise. ST Image f0 f= ST (f0 )
  • 26. Sparse Priors Coe cients x Ideal sparsity: for most m, xm = 0. J0 (x) = # {m xm = 0} Sparse approximation: f = x where argmin ||f0 x||2 + T J0 (x) x RN Orthogonal : = = IdN f0 , m if | f0 , m | > T, xm = 0 otherwise. ST Image f0 f= ST (f0 ) Non-orthogonal : NP-hard.
  • 27. Convex Relaxation: L1 Prior J0 (x) = # {m xm = 0} J0 (x) = 0 null image. Image with 2 pixels: J0 (x) = 1 sparse image. J0 (x) = 2 non-sparse image. x2 x1 q=0
  • 28. Convex Relaxation: L1 Prior J0 (x) = # {m xm = 0} J0 (x) = 0 null image. Image with 2 pixels: J0 (x) = 1 sparse image. J0 (x) = 2 non-sparse image. x2 x1 q=0 q = 1/2 q=1 q = 3/2 q=2 q priors: Jq (x) = |xm |q (convex for q 1) m
  • 29. Convex Relaxation: L1 Prior J0 (x) = # {m xm = 0} J0 (x) = 0 null image. Image with 2 pixels: J0 (x) = 1 sparse image. J0 (x) = 2 non-sparse image. x2 x1 q=0 q = 1/2 q=1 q = 3/2 q=2 q priors: Jq (x) = |xm |q (convex for q 1) m Sparse 1 prior: J1 (x) = |xm | m
  • 30. L1 Regularization x0 RN coe cients
  • 31. L1 Regularization x0 RN f0 = x0 RQ coe cients image
  • 32. L1 Regularization x0 RN f0 = x0 RQ y = Kf0 + w RP coe cients image observations K w
  • 33. L1 Regularization x0 RN f0 = x0 RQ y = Kf0 + w RP coe cients image observations K w = K ⇥ ⇥ RP N
  • 34. L1 Regularization x0 RN f0 = x0 RQ y = Kf0 + w RP coe cients image observations K w = K ⇥ ⇥ RP N Sparse recovery: f = x where x solves 1 min ||y x||2 + ||x||1 x RN 2 Fidelity Regularization
  • 35. Noiseless Sparse Regularization Noiseless measurements: y = x0 x x= y x argmin |xm | x=y m
  • 36. Noiseless Sparse Regularization Noiseless measurements: y = x0 x x x= x= y y x argmin |xm | x argmin |xm |2 x=y m x=y m
  • 37. Noiseless Sparse Regularization Noiseless measurements: y = x0 x x x= x= y y x argmin |xm | x argmin |xm |2 x=y m x=y m Convex linear program. Interior points, cf. [Chen, Donoho, Saunders] “basis pursuit”. Douglas-Rachford splitting, see [Combettes, Pesquet].
  • 38. Noisy Sparse Regularization Noisy measurements: y = x0 + w 1 x argmin ||y x||2 + ||x||1 x RQ 2 Data fidelity Regularization
  • 39. Noisy Sparse Regularization Noisy measurements: y = x0 + w 1 x argmin ||y x||2 + ||x||1 x RQ 2 Equivalence Data fidelity Regularization x argmin ||x||1 || x y|| | x= x y|
  • 40. Noisy Sparse Regularization Noisy measurements: y = x0 + w 1 x argmin ||y x||2 + ||x||1 x RQ 2 Equivalence Data fidelity Regularization x argmin ||x||1 || x y|| | x= Algorithms: x y| Iterative soft thresholding Forward-backward splitting see [Daubechies et al], [Pesquet et al], etc Nesterov multi-steps schemes.
  • 42. Image De-blurring Original f0 y = h f0 + w Sobolev SNR=22.7dB Sobolev regularization: f = argmin ||f ⇥ h y||2 + ||⇥f ||2 f RN ˆ h(⇥) ˆ f (⇥) = y (⇥) ˆ ˆ |h(⇥)|2 + |⇥|2
  • 43. Image De-blurring Original f0 y = h f0 + w Sobolev Sparsity SNR=22.7dB SNR=24.7dB Sobolev regularization: f = argmin ||f ⇥ h y||2 + ||⇥f ||2 f RN ˆ h(⇥) ˆ f (⇥) = y (⇥) ˆ ˆ |h(⇥)|2 + |⇥|2 Sparsity regularization: = translation invariant wavelets. 1 f = x where x argmin ||h ( x) y||2 + ||x||1 x 2
  • 44. Inpainting Problem K 0 if x , (Kf )(x) = f (x) if x / . Measures: y = Kf0 + w
  • 45. Image Separation Model: f = f1 + f2 + w, (f1 , f2 ) components, w noise.
  • 46. Image Separation Model: f = f1 + f2 + w, (f1 , f2 ) components, w noise.
  • 47. Image Separation Model: f = f1 + f2 + w, (f1 , f2 ) components, w noise. Union dictionary: =[ 1, 2] RQ (N1 +N2 ) Recovered component: fi = i xi . 1 (x1 , x2 ) argmin ||f x||2 + ||x||1 x=(x1 ,x2 ) RN 2
  • 50. Overview • Inverse Problems Regularization • Sparse Synthesis Regularization • Theoritical Recovery Guarantees • Compressed Sensing • RIP and Polytopes CS Theory • Fourier Measurements • Convex Optimization via Proximal Splitting
  • 51. Basics of Convex Analysis Setting: G:H R ⇤ {+⇥} Here: H = RN . Problem: min G(x) x H
  • 52. Basics of Convex Analysis Setting: G:H R ⇤ {+⇥} Here: H = RN . Problem: min G(x) x H Convex: t [0, 1] x y G(tx + (1 t)y) tG(x) + (1 t)G(y)
  • 53. Basics of Convex Analysis Setting: G:H R ⇤ {+⇥} Here: H = RN . Problem: min G(x) x H Convex: t [0, 1] x y G(tx + (1 t)y) tG(x) + (1 t)G(y) Sub-di erential: G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧} G(x) = |x| G(0) = [ 1, 1]
  • 54. Basics of Convex Analysis Setting: G:H R ⇤ {+⇥} Here: H = RN . Problem: min G(x) x H Convex: t [0, 1] x y G(tx + (1 t)y) tG(x) + (1 t)G(y) Sub-di erential: G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧} Smooth functions: G(x) = |x| If F is C 1 , F (x) = { F (x)} G(0) = [ 1, 1]
  • 55. Basics of Convex Analysis Setting: G:H R ⇤ {+⇥} Here: H = RN . Problem: min G(x) x H Convex: t [0, 1] x y G(tx + (1 t)y) tG(x) + (1 t)G(y) Sub-di erential: G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧} Smooth functions: G(x) = |x| If F is C 1 , F (x) = { F (x)} First-order conditions: x argmin G(x) 0 G(x ) G(0) = [ 1, 1] x H
  • 56. L1 Regularization: First Order Conditions 1 x ⇥ argmin G(x) = ||y x||2 + ||x||1 x RQ 2 ⇥G(x) = ( x y) + ⇥|| · ||1 (x) sign(xi ) if xi ⇥= 0, || · ||1 (x)i = [ 1, 1] if xi = 0.
  • 57. L1 Regularization: First Order Conditions 1 x ⇥ argmin G(x) = ||y x||2 + ||x||1 x RQ 2 ⇥G(x) = ( x y) + ⇥|| · ||1 (x) sign(xi ) if xi ⇥= 0, || · ||1 (x)i = [ 1, 1] if xi = 0. xi Support of the solution: i I = {i ⇥ {0, . . . , N 1} xi ⇤= 0}
  • 58. L1 Regularization: First Order Conditions 1 x ⇥ argmin G(x) = ||y x||2 + ||x||1 x RQ 2 ⇥G(x) = ( x y) + ⇥|| · ||1 (x) sign(xi ) if xi ⇥= 0, || · ||1 (x)i = [ 1, 1] if xi = 0. xi Support of the solution: i I = {i ⇥ {0, . . . , N 1} xi ⇤= 0} Restrictions: xI = (xi )i I R|I| I = ( i )i I RP |I|
  • 59. L1 Regularization: First Order Conditions 1 xi x argmin || x y||2 + ||x||1 P (y) x RN 2 i First order condition: ( x y) + s = 0 sI = sign(xI ), where ||sI c || 1
  • 60. L1 Regularization: First Order Conditions 1 xi x argmin || x y||2 + ||x||1 P (y) x RN 2 i First order condition: ( x y) + s = 0 i, y x sI = sign(xI ), i where ||sI c || 1 1 = sI c = I c (y x )
  • 61. L1 Regularization: First Order Conditions 1 xi x argmin || x y||2 + ||x||1 P (y) x RN 2 i First order condition: ( x y) + s = 0 i, y x sI = sign(xI ), i where ||sI c || 1 1 = sI c = I c (y x ) Theorem: || Ic ( x y)|| x solution of P (y)
  • 62. L1 Regularization: First Order Conditions 1 xi x argmin || x y||2 + ||x||1 P (y) x RN 2 i First order condition: ( x y) + s = 0 i, y x sI = sign(xI ), i where ||sI c || 1 1 = sI c = I c (y x ) Theorem: || Ic ( x y)|| x solution of P (y) Theorem: If I has full rank and || I c ( x y)|| < then x is the unique solution of P (y)
  • 63. Local Behavior of the Solution 1 x argmin || x y||2 + ||x||1 x RN 2 First order condition: ( x y) + s = 0 = xI = + I y ( I I) 1 sign(xI ) (implicit equation) = x0,I + + I w ( I I) 1 sI
  • 64. Local Behavior of the Solution 1 x argmin || x y||2 + ||x||1 x RN 2 First order condition: ( x y) + s = 0 = xI = + I y ( I I) 1 sign(xI ) (implicit equation) = x0,I + + I w ( I I) 1 sI Intuition: sI = sign(xI ) = sign(x0,I ) = s0,I for small w. (unknown) (known)
  • 65. Local Behavior of the Solution 1 x argmin || x y||2 + ||x||1 x RN 2 First order condition: ( x y) + s = 0 = xI = + I y ( I I) 1 sign(xI ) (implicit equation) = x0,I + + I w ( I I) 1 sI Intuition: sI = sign(xI ) = sign(x0,I ) = s0,I for small w. (unknown) (known) To prove: xI = x0,I + ˆ + I w ( I I) 1 s0,I is the unique solution.
  • 66. Local Behavior of the Solution Candidate for the solution: xI = x0,I + ˆ + I w ( I I) 1 s0,I
  • 67. Local Behavior of the Solution Candidate for the solution: xI = x0,I + ˆ + I w ( I I) 1 s0,I To prove: || Ic ( ˆ I xI y)|| <1
  • 68. Local Behavior of the Solution Candidate for the solution: xI = x0,I + ˆ + I w ( I I) 1 s0,I To prove: || Ic ( ˆ I xI y)|| <1 1 w Ic ( ˆ I xI y) = I I (s0,I ) +, I = Ic ( I + I Id) I = Ic I
  • 69. Local Behavior of the Solution Candidate for the solution: xI = x0,I + ˆ + I w ( I I) 1 s0,I To prove: || Ic ( ˆ I xI y)|| <1 1 w Ic ( ˆ I xI y) = I I (s0,I ) can be made || · || must small when w 0 be < 1 +, I = Ic ( I + I Id) I = Ic I
  • 70. Robustness to Small Noise Identifiability crition: [Fuchs] For s ⇥ { 1, 0, +1}N , let I = supp(s) F(s) = || I sI || where I = Ic +, I
  • 71. Robustness to Small Noise Identifiability crition: [Fuchs] For s ⇥ { 1, 0, +1}N , let I = supp(s) F(s) = || I sI || where I = Ic +, I Theorem: [Fuchs 2004] If F (sign(x0 )) < 1, T = min |x0,i | i I If ||w||/T is small enough and ||w||, then x0,I + + I w ( I I) 1 sign(x0,I ) is the unique solution of P (y).
  • 72. Robustness to Small Noise Identifiability crition: [Fuchs] For s ⇥ { 1, 0, +1}N , let I = supp(s) F(s) = || I sI || where I = Ic +, I Theorem: [Fuchs 2004] If F (sign(x0 )) < 1, T = min |x0,i | i I If ||w||/T is small enough and ||w||, then x0,I + + I w ( I I) 1 sign(x0,I ) is the unique solution of P (y). When w = 0, F (sign(x0 ) < 1 = x = x0 .
  • 73. Robustness to Small Noise Identifiability crition: [Fuchs] For s ⇥ { 1, 0, +1}N , let I = supp(s) F(s) = || I sI || where I = Ic +, I Theorem: [Fuchs 2004] If F (sign(x0 )) < 1, T = min |x0,i | i I If ||w||/T is small enough and ||w||, then x0,I + + I w ( I I) 1 sign(x0,I ) is the unique solution of P (y). When w = 0, F (sign(x0 ) < 1 = x = x0 . Theorem: [Grassmair et al. 2010] If F (sign(x0 )) < 1 if ||w||, ||x x0 || = O(||w||)
  • 74. Geometric Interpretation +, dI = sI F(s) = || I sI || = max | dI , j | I i j /I where dI defined by: dI = I( I I) 1 sI i I, dI , i = si j
  • 75. Geometric Interpretation +, dI = sI F(s) = || I sI || = max | dI , j | I i j /I where dI defined by: dI = I( I I) 1 sI i I, dI , i = si j Condition F (s) < 1: no vector j inside the cap Cs . dI j Cs i | dI , ⇥| < 1
  • 76. Geometric Interpretation +, dI = sI F(s) = || I sI || = max | dI , j | I i j /I where dI defined by: dI = I( I I) 1 sI i I, dI , i = si j Condition F (s) < 1: no vector j inside the cap Cs . dI j dI i k | dI , ⇥| < 1 j Cs i | dI , ⇥| < 1
  • 77. Robustness to Bounded Noise Exact Recovery Criterion (ERC): [Tropp] For a support I ⇥ {0, . . . , N 1} with I full rank, ERC(I) = || I || , where I = Ic +, I = || + I Ic ||1,1 = max || c + I j ||1 j I (use ||(aj )j ||1,1 = maxj ||aj ||1 ) Relation with F criterion: ERC(I) = max F(s) s,supp(s) I
  • 78. Robustness to Bounded Noise Exact Recovery Criterion (ERC): [Tropp] For a support I ⇥ {0, . . . , N 1} with I full rank, ERC(I) = || I || , where I = Ic +, I = || + I Ic ||1,1 = max || c + I j ||1 j I (use ||(aj )j ||1,1 = maxj ||aj ||1 ) Relation with F criterion: ERC(I) = max F(s) s,supp(s) I Theorem: If ERC(supp(x0 )) < 1 and ||w||, then x is unique, satisfies supp(x ) supp(x0 ), and ||x0 x || = O(||w||)
  • 79. Example: Random Matrix P = 200, N = 1000 1 0.8 0.6 0.4 0.2 0 0 10 20 30 40 50 w-ERC < 1 F <1 ERC < 1 x = x0
  • 80. Example: Deconvolution ⇥x = xi (· i) x0 i Increasing : reduces correlation. x0 reduces resolution. F (s) ERC(I) w-ERC(I)
  • 81. Coherence Bounds Mutual coherence: µ( ) = max | i, j ⇥| i=j |I|µ( ) Theorem: F(s) ERC(I) w-ERC(I) 1 (|I| 1)µ( )
  • 82. Coherence Bounds Mutual coherence: µ( ) = max | i, j ⇥| i=j |I|µ( ) Theorem: F(s) ERC(I) w-ERC(I) 1 (|I| 1)µ( ) 1 1 Theorem: If ||x0 ||0 < 1+ and ||w||, 2 µ( ) one has supp(x ) I, and ||x0 x || = O(||w||)
  • 83. Coherence Bounds Mutual coherence: µ( ) = max | i, j ⇥| i=j |I|µ( ) Theorem: F(s) ERC(I) w-ERC(I) 1 (|I| 1)µ( ) 1 1 Theorem: If ||x0 ||0 < 1+ and ||w||, 2 µ( ) one has supp(x ) I, and ||x0 x || = O(||w||) N P One has: µ( ) P (N 1) Optimistic setting: For Gaussian matrices: ||x0 ||0 O( P ) µ( ) log(P N )/P For convolution matrices: useless criterion.
  • 84. Spikes and Sinusoids Separation Incoherent pair of orthobases: Diracs/Fourier 2i 1 = {k ⇤⇥ [k m]}m 2 = k N 1/2 e N mk m =[ 1, 2] RN 2N
  • 85. Spikes and Sinusoids Separation Incoherent pair of orthobases: Diracs/Fourier 2i 1 = {k ⇤⇥ [k m]}m 2 = k N 1/2 e N mk m =[ 1, 2] RN 2N 1 min ||y x||2 + ||x||1 x R2N 2 1 min ||y 1 x1 2 x2 ||2 + ||x1 ||1 + ||x2 ||1 x1 ,x2 RN 2 = +
  • 86. Spikes and Sinusoids Separation Incoherent pair of orthobases: Diracs/Fourier 2i 1 = {k ⇤⇥ [k m]}m 2 = k N 1/2 e N mk m =[ 1, 2] RN 2N 1 min ||y x||2 + ||x||1 x R2N 2 1 min ||y 1 x1 2 x2 ||2 + ||x1 ||1 + ||x2 ||1 x1 ,x2 RN 2 = + 1 µ( ) = = separates up to N /2 Diracs + sines. N
  • 87. Overview • Inverse Problems Regularization • Sparse Synthesis Regularization • Theoritical Recovery Guarantees • Compressed Sensing • RIP and Polytopes CS Theory • Fourier Measurements • Convex Optimization via Proximal Splitting
  • 88. Pointwise Sampling and Smoothness Data aquisition: ˜ ˜ f [i] = f (i/N ) = f , i 0 1 Sensors 2 ( i )i (Diracs) ˜ f L2 f RN ˆ ˜ Shannon interpolation: if Supp(f ) [ N ,N ]
  • 89. Pointwise Sampling and Smoothness Data aquisition: ˜ ˜ f [i] = f (i/N ) = f , i 0 1 Sensors 2 ( i )i (Diracs) ˜ f L2 f RN ˆ ˜ Shannon interpolation: if Supp(f ) [ N ,N ] ˜ f (t) = f [i]h(N t i) i sin( t) where h(t) = t
  • 90. Pointwise Sampling and Smoothness Data aquisition: ˜ ˜ f [i] = f (i/N ) = f , i 0 1 Sensors 2 ( i )i (Diracs) ˜ f L2 f RN ˆ ˜ Shannon interpolation: if Supp(f ) [ N ,N ] ˜ f (t) = f [i]h(N t i) i sin( t) where h(t) = t Natural images are not smooth. But can be compressed e ciently.
  • 91. Single Pixel Camera (Rice) y[i] = f0 , i⇥
  • 92. Single Pixel Camera (Rice) y[i] = f0 , i⇥ f0 , N = 2562 f , P/N = 0.16 f , P/N = 0.02
  • 93. CS Hardware Model ˜ CS is about designing hardware: input signals f L2 (R2 ). Physical hardware resolution limit: target resolution f RN . array micro ˜ f L 2 f R N mirrors y RP resolution K CS hardware
  • 94. CS Hardware Model ˜ CS is about designing hardware: input signals f L2 (R2 ). Physical hardware resolution limit: target resolution f RN . array micro ˜ f L 2 f R N mirrors y RP resolution K CS hardware , , ... ,
  • 95. CS Hardware Model ˜ CS is about designing hardware: input signals f L2 (R2 ). Physical hardware resolution limit: target resolution f RN . array micro ˜ f L 2 f R N mirrors y RP resolution K CS hardware , Operator K , f ... ,
  • 96. Sparse CS Recovery f0 RN f0 RN sparse in ortho-basis x0 RN
  • 97. Sparse CS Recovery f0 RN f0 RN sparse in ortho-basis (Discretized) sampling acquisition: y = Kf0 + w = K (x0 ) + w = x0 RN
  • 98. Sparse CS Recovery f0 RN f0 RN sparse in ortho-basis (Discretized) sampling acquisition: y = Kf0 + w = K (x0 ) + w = K drawn from the Gaussian matrix ensemble Ki,j N (0, P 1/2 ) i.i.d. drawn from the Gaussian matrix ensemble x0 RN
  • 99. Sparse CS Recovery f0 RN f0 RN sparse in ortho-basis (Discretized) sampling acquisition: y = Kf0 + w = K (x0 ) + w = K drawn from the Gaussian matrix ensemble Ki,j N (0, P 1/2 ) i.i.d. drawn from the Gaussian matrix ensemble Sparse recovery: x0 RN ||w|| 1 min ||x||1 min || x y||2 + ||x||1 || x y|| ||w|| x 2
  • 100. CS Simulation Example Original f0 = translation invariant wavelet frame
  • 101. Overview • Inverse Problems Regularization • Sparse Synthesis Regularization • Theoritical Recovery Guarantees • Compressed Sensing • RIP and Polytopes CS Theory • Fourier Measurements • Convex Optimization via Proximal Splitting
  • 102. CS with RIP 1 recovery: y = x0 + w x⇥ argmin ||x||1 where || x y|| ||w|| Restricted Isometry Constants: ⇥ ||x||0 k, (1 k )||x||2 || x||2 (1 + k )||x||2
  • 103. CS with RIP 1 recovery: y = x0 + w x⇥ argmin ||x||1 where || x y|| ||w|| Restricted Isometry Constants: ⇥ ||x||0 k, (1 k )||x||2 || x||2 (1 + k )||x||2 Theorem: If 2k 2 1, then [Candes 2009] C0 ||x0 x || ⇥ ||x0 xk ||1 + C1 k where xk is the best k-term approximation of x0 .
  • 104. Singular Values Distributions Eigenvalues of I I with |I| = k are essentially in [a, b] a = (1 )2 and b = (1 )2 where = k/P When k = P + , the eigenvalue distribution tends to 1 f (⇥) = (⇥ b)+ (a ⇥)+ [Marcenko-Pastur] 1.5 2⇤ ⇥ P=200, k=10 P=200, k=10 f ( ) 1.5 1 1 0.5 P = 200, k = 10 0.5 0 0 0.5 1 1.5 2 2.5 0 0 0.5 1 P=200, k=30 1.5 2 2.5 1 P=200, k=30 0.8 1 0.6 0.8 0.4 k = 30 0.6 0.2 0.4 0 0.2 0 0.5 1 1.5 2 2.5 0 0 0.5 1 P=200, k=50 1.5 2 2.5 P=200, k=50 0.8 0.8 0.6 0.6 0.4 Large deviation inequality [Ledoux] 0.4 0.2
  • 105. RIP for Gaussian Matrices Link with coherence: µ( ) = max | i, j ⇥| i=j 2 = µ( ) k (k 1)µ( )
  • 106. RIP for Gaussian Matrices Link with coherence: µ( ) = max | i, j ⇥| i=j 2 = µ( ) k (k 1)µ( ) For Gaussian matrices: µ( ) log(P N )/P
  • 107. RIP for Gaussian Matrices Link with coherence: µ( ) = max | i, j ⇥| i=j 2 = µ( ) k (k 1)µ( ) For Gaussian matrices: µ( ) log(P N )/P Stronger result: C Theorem: If k P log(N/P ) then 2k 2 1 with high probability.
  • 108. Numerics with RIP Stability constant of A: (1 ⇥1 (A))|| ||2 ||A ||2 (1 + ⇥2 (A))|| ||2 smallest / largest eigenvalues of A A
  • 109. Numerics with RIP Stability constant of A: (1 ⇥1 (A))|| ||2 ||A ||2 (1 + ⇥2 (A))|| ||2 smallest / largest eigenvalues of A A Upper/lower RIC: ˆ2 k i k = max i( I) |I|=k 2 1 ˆ2 k k = min( k, k) 1 2 Monte-Carlo estimation: ˆk k k N = 4000, P = 1000
  • 110. Polytopes-based Guarantees Noiseless recovery: x argmin ||x||1 (P0 (y)) x=y = ( i )i R2 3 3 2 1 x0 x0 1 y x 3 B = {x ||x||1 } 2 (B ) = ||x0 ||1
  • 111. Polytopes-based Guarantees Noiseless recovery: x argmin ||x||1 (P0 (y)) x=y = ( i )i R2 3 3 2 1 x0 x0 1 y x 3 B = {x ||x||1 } 2 (B ) = ||x0 ||1 x0 solution of P0 ( x0 ) ⇥ x0 ⇤ (B )
  • 112. L1 Recovery in 2-D = ( i )i R2 3 C(0,1,1) 2 3 K(0,1,1) 1 y x 2-D quadrant 2-D cones Ks = ( i si )i R3 i 0 Cs = Ks
  • 113. Polytope Noiseless Recovery Counting faces of random polytopes: [Donoho] All x0 such that ||x0 ||0 Call (P/N )P are identifiable. Most x0 such that ||x0 ||0 Cmost (P/N )P are identifiable. Call (1/4) 0.065 1 0.9 Cmost (1/4) 0.25 0.8 0.7 0.6 Sharp constants. 0.5 0.4 No noise robustness. 0.3 0.2 0.1 0 50 100 150 200 250 300 350 400 RIP All Most
  • 114. Polytope Noiseless Recovery Counting faces of random polytopes: [Donoho] All x0 such that ||x0 ||0 Call (P/N )P are identifiable. Most x0 such that ||x0 ||0 Cmost (P/N )P are identifiable. Call (1/4) 0.065 1 0.9 Cmost (1/4) 0.25 0.8 0.7 0.6 Sharp constants. 0.5 0.4 No noise robustness. 0.3 Computation of 0.2 0.1 “pathological” signals 0 50 100 150 200 250 300 350 400 [Dossal, P, Fadili, 2010] RIP All Most
  • 115. Overview • Inverse Problems Regularization • Sparse Synthesis Regularization • Theoritical Recovery Guarantees • Compressed Sensing • RIP and Polytopes CS Theory • Fourier Measurements • Convex Optimization via Proximal Splitting
  • 117. Tomography and Fourier Measures ˆ f = FFT2(f ) k Fourier slice theorem: ˆ ˆ p (⇥) = f (⇥ cos( ), ⇥ sin( )) 1D 2D Fourier R Partial Fourier measurements: {p k (t)}t 0 k<K Equivalent to: ˆ f = {f [ ]}
  • 118. Regularized Inversion Noisy measurements: ⇥ ˆ , y[ ] = f0 [ ] + w[ ]. Noise: w[⇥] N (0, ), white noise. 1 regularization: 1 ˆ f = argmin ⇥ |y[⇤] f [⇤]|2 + |⇥f, ⇥m ⇤|. f 2 m + f f Disclaimer: this is not compressed sensing.
  • 119. MRI Imaging From [Lutsig et al.]
  • 120. MRI Reconstruction From [Lutsig et al.] randomization Fourier sub-sampling pattern: High resolution Low resolution Linear Sparsity
  • 121. Compressive Fourier Measurements Sampling low frequencies helps. Pseudo inverse Sparse wavelets
  • 122. Structured Measurements Gaussian matrices: intractable for large N . Random partial orthogonal matrix: { } orthogonal basis. =( ) where | | = P drawn uniformly at random. Fast measurements: (e.g. Fourier basis) , y[ ] = f, ⇥ ˆ = f[ ]
  • 123. Structured Measurements Gaussian matrices: intractable for large N . Random partial orthogonal matrix: { } orthogonal basis. =( ) where | | = P drawn uniformly at random. Fast measurements: (e.g. Fourier basis) , ˆ y[ ] = f, ⇥ = f [ ] ⌅ ⌅ Mutual incoherence: µ = N max |⇥⇥ , m ⇤| [1, N ] ,m
  • 124. Structured Measurements Gaussian matrices: intractable for large N . Random partial orthogonal matrix: { } orthogonal basis. =( ) where | | = P drawn uniformly at random. Fast measurements: (e.g. Fourier basis) , ˆ y[ ] = f, ⇥ = f [ ] ⌅ ⌅ Mutual incoherence: µ = N max |⇥⇥ , m ⇤| [1, N ] ,m Theorem: with high probability on , CP If M 2 log(N )4 , then 2M 2 1 µ [Rudelson, Vershynin, 2006] not universal: requires incoherence.
  • 125. Overview • Inverse Problems Regularization • Sparse Synthesis Regularization • Theoritical Recovery Guarantees • Compressed Sensing • RIP and Polytopes CS Theory • Fourier Measurements • Convex Optimization via Proximal Splitting
  • 126. Convex Optimization Setting: G : H R ⇤ {+⇥} H: Hilbert space. Here: H = RN . Problem: min G(x) x H
  • 127. Convex Optimization Setting: G : H R ⇤ {+⇥} H: Hilbert space. Here: H = RN . Problem: min G(x) x H Class of functions: x y Convex: G(tx + (1 t)y) tG(x) + (1 t)G(y) t [0, 1]
  • 128. Convex Optimization Setting: G : H R ⇤ {+⇥} H: Hilbert space. Here: H = RN . Problem: min G(x) x H Class of functions: x y Convex: G(tx + (1 t)y) tG(x) + (1 t)G(y) t [0, 1] Lower semi-continuous: lim inf G(x) G(x0 ) x x0 Proper: {x ⇥ H G(x) ⇤= + } = ⌅ ⇤
  • 129. Convex Optimization Setting: G : H R ⇤ {+⇥} H: Hilbert space. Here: H = RN . Problem: min G(x) x H Class of functions: x y Convex: G(tx + (1 t)y) tG(x) + (1 t)G(y) t [0, 1] Lower semi-continuous: lim inf G(x) G(x0 ) x x0 Proper: {x ⇥ H G(x) ⇤= + } = ⌅ ⇤ 0 if x ⇥ C, Indicator: C (x) = + otherwise. (C closed and convex)
  • 130. Proximal Operators Proximal operator of G: 1 Prox G (x) = argmin ||x z||2 + G(z) z 2
  • 131. Proximal Operators Proximal operator of G: 1 Prox G (x) = argmin ||x z||2 + G(z) z 2 12 log(1 + x2 ) G(x) = ||x||1 = |xi | 10 |x| ||x||0 8 i 6 4 2 0 G(x) = ||x||0 = | {i xi = 0} | −2 G(x) −10 −8 −6 −4 −2 0 2 4 6 8 10 G(x) = log(1 + |xi |2 ) i
  • 132. Proximal Operators Proximal operator of G: 1 Prox G (x) = argmin ||x z||2 + G(z) z 2 12 log(1 + x2 ) G(x) = ||x||1 = |xi | 10 |x| ||x||0 8 i Prox G (x)i = max 0, 1 6 xi 4 |xi | 2 0 G(x) = ||x||0 = | {i xi = 0} | −2 G(x) −10 −8 −6 −4 −2 0 2 4 6 8 10 xi if |xi | 2 , 10 Prox G (x)i = 8 0 otherwise. 6 4 2 0 G(x) = log(1 + |xi |2 ) −2 −4 i −6 3rd order polynomial root. −8 ProxG (x) −10 −10 −8 −6 −4 −2 0 2 4 6 8 10
  • 133. Proximal Calculus Separability: G(x) = G1 (x1 ) + . . . + Gn (xn ) ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
  • 134. Proximal Calculus Separability: G(x) = G1 (x1 ) + . . . + Gn (xn ) ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn )) 1 Quadratic functionals: G(x) = || x y||2 2 Prox G = (Id + ) 1 = (Id + ) 1
  • 135. Proximal Calculus Separability: G(x) = G1 (x1 ) + . . . + Gn (xn ) ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn )) 1 Quadratic functionals: G(x) = || x y||2 2 Prox G = (Id + ) 1 = (Id + ) 1 Composition by tight frame: A A = Id ProxG A (x) =A ProxG A + Id A A
  • 136. Proximal Calculus Separability: G(x) = G1 (x1 ) + . . . + Gn (xn ) ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn )) 1 Quadratic functionals: G(x) = || x y||2 2 Prox G = (Id + ) 1 = (Id + ) 1 Composition by tight frame: A A = Id ProxG A (x) =A ProxG A + Id A A x Indicators: G(x) = C (x) C Prox G (x) = ProjC (x) ProjC (x) = argmin ||x z|| z C
  • 137. Gradient and Proximal Descents Gradient descent: x( +1) = x( ) G(x( ) ) [explicit] G is C 1 and G is L-Lipschitz Theorem: If 0 < < 2/L, x( ) x a solution.
  • 138. Gradient and Proximal Descents Gradient descent: x( +1) = x( ) G(x( ) ) [explicit] G is C 1 and G is L-Lipschitz Theorem: If 0 < < 2/L, x( ) x a solution. Sub-gradient descent: x( +1) = x( ) v( ) , v( ) G(x( ) ) Theorem: If 1/⇥, x( ) x a solution. Problem: slow.
  • 139. Gradient and Proximal Descents Gradient descent: x( +1) = x( ) G(x( ) ) [explicit] G is C 1 and G is L-Lipschitz Theorem: If 0 < < 2/L, x( ) x a solution. Sub-gradient descent: x( +1) = x( ) v( ) , v( ) G(x( ) ) Theorem: If 1/⇥, x( ) x a solution. Problem: slow. Proximal-point algorithm: x(⇥+1) = Prox G (x(⇥) ) [implicit] Theorem: If c > 0, x( ) x a solution. Prox G hard to compute.
  • 140. Proximal Splitting Methods Solve min E(x) x H Problem: Prox E is not available.
  • 141. Proximal Splitting Methods Solve min E(x) x H Problem: Prox E is not available. Splitting: E(x) = F (x) + Gi (x) i Smooth Simple
  • 142. Proximal Splitting Methods Solve min E(x) x H Problem: Prox E is not available. Splitting: E(x) = F (x) + Gi (x) i Smooth Simple F (x) Iterative algorithms using: Prox Gi (x) solves Forward-Backward: F + G Douglas-Rachford: Gi Primal-Dual: Gi A Generalized FB: F+ Gi
  • 143. Smooth + Simple Splitting Inverse problem: measurements y = Kf0 + w f0 Kf0 K K : RN RP , P N Model: f0 = x0 sparse in dictionary . Sparse recovery: f = x where x solves min F (x) + G(x) x RN Smooth Simple 1 Data fidelity: F (x) = ||y x||2 =K ⇥ 2 Regularization: G(x) = ||x||1 = |xi | i
  • 144. Forward-Backward Fix point equation: x argmin F (x) + G(x) 0 F (x ) + G(x ) x (x F (x )) x + ⇥G(x ) x⇥ = Prox G (x⇥ F (x⇥ ))
  • 145. Forward-Backward Fix point equation: x argmin F (x) + G(x) 0 F (x ) + G(x ) x (x F (x )) x + ⇥G(x ) x⇥ = Prox G (x⇥ F (x⇥ )) Forward-backward: x(⇥+1) = Prox G x(⇥) F (x(⇥) )
  • 146. Forward-Backward Fix point equation: x argmin F (x) + G(x) 0 F (x ) + G(x ) x (x F (x )) x + ⇥G(x ) x⇥ = Prox G (x⇥ F (x⇥ )) Forward-backward: x(⇥+1) = Prox G x(⇥) F (x(⇥) ) Projected gradient descent: G= C
  • 147. Forward-Backward Fix point equation: x argmin F (x) + G(x) 0 F (x ) + G(x ) x (x F (x )) x + ⇥G(x ) x⇥ = Prox G (x⇥ F (x⇥ )) Forward-backward: x(⇥+1) = Prox G x(⇥) F (x(⇥) ) Projected gradient descent: G= C Theorem: Let F be L-Lipschitz. If < 2/L, x( ) x a solution of ( )
  • 148. Example: L1 Regularization 1 min || x y||2 + ||x||1 min F (x) + G(x) x 2 x 1 F (x) = || x y||2 2 F (x) = ( x y) L = || || G(x) = ||x||1 ⇥ Prox G (x)i = max 0, 1 xi |xi | Forward-backward Iterative soft thresholding
  • 149. Douglas Rachford Scheme min G1 (x) + G2 (x) ( ) x Simple Simple Douglas-Rachford iterations: z (⇥+1) = 1 z (⇥) + RProx G2 RProx G1 (z (⇥) ) 2 2 x(⇥+1) = Prox G2 (z (⇥+1) ) Reflexive prox: RProx G (x) = 2Prox G (x) x
  • 150. Douglas Rachford Scheme min G1 (x) + G2 (x) ( ) x Simple Simple Douglas-Rachford iterations: z (⇥+1) = 1 z (⇥) + RProx G2 RProx G1 (z (⇥) ) 2 2 x(⇥+1) = Prox G2 (z (⇥+1) ) Reflexive prox: RProx G (x) = 2Prox G (x) x Theorem: If 0 < < 2 and ⇥ > 0, x( ) x a solution of ( )
  • 151. Example: Constrainted L1 min ||x||1 min G1 (x) + G2 (x) x=y x G1 (x) = iC (x), C = {x x = y} Prox G1 (x) = ProjC (x) = x + ⇥ ( ⇥ ) 1 (y x) G2 (x) = ||x||1 Prox G2 (x) = max 0, 1 xi |xi | i e⇥cient if easy to invert.
  • 152. Example: Constrainted L1 min ||x||1 min G1 (x) + G2 (x) x=y x G1 (x) = iC (x), C = {x x = y} Prox G1 (x) = ProjC (x) = x + ⇥ ( ⇥ ) 1 (y x) G2 (x) = ||x||1 Prox G2 (x) = max 0, 1 xi |xi | i e⇥cient if easy to invert. log10 (||x( ) ||1 ||x ||1 ) 1 Example: compressed sensing −1 0 R100 400 Gaussian matrix −2 −3 = 0.01 y = x0 ||x0 ||0 = 17 −4 =1 −5 = 10 50 100 150 200 250
  • 153. More than 2 Functionals min G1 (x) + . . . + Gk (x) each Fi is simple x min G(x1 , . . . , xk ) + C (x1 , . . . , xk ) x G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk ) C = (x1 , . . . , xk ) Hk x1 = . . . = xk
  • 154. More than 2 Functionals min G1 (x) + . . . + Gk (x) each Fi is simple x min G(x1 , . . . , xk ) + C (x1 , . . . , xk ) x G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk ) C = (x1 , . . . , xk ) Hk x1 = . . . = xk G and C are simple: Prox G (x1 , . . . , xk ) = (Prox Gi (xi ))i 1 Prox ⇥C (x1 , . . . , xk ) = (˜, . . . , x) x ˜ where x = ˜ xi k i
  • 155. Auxiliary Variables min G1 (x) + G2 A(x) Linear map A : E H. x min G(z) + C (z) G1 , G2 simple. z⇥H E G(x, y) = G1 (x) + G2 (y) C = {(x, y) ⇥ H E Ax = y}
  • 156. Auxiliary Variables min G1 (x) + G2 A(x) Linear map A : E H. x min G(z) + C (z) G1 , G2 simple. z⇥H E G(x, y) = G1 (x) + G2 (y) C = {(x, y) ⇥ H E Ax = y} Prox G (x, y) = (Prox G1 (x), Prox G2 (y)) Prox C (x, y) = (x + A y , y ˜ y ) = (˜, A˜) ˜ x x y = (Id + AA ) ˜ 1 (Ax y) where x = (Id + A A) ˜ 1 (A y + x) e cient if Id + AA or Id + A A easy to invert.
  • 157. Example: TV Regularization 1 ||u||1 = ||ui || min ||Kf y||2 + ||⇥f ||1 f 2 i min G1 (f ) + G2 (f ) x G1 (u) = ||u||1 Prox G1 (u)i = max 0, 1 ui ||ui || 1 G2 (f ) = ||Kf y||2 Prox = (Id + K K) 1 K 2 G2 C = (f, u) ⇥ RN RN 2 u = ⇤f ˜ ˜ Prox C (f, u) = (f , f )
  • 158. Example: TV Regularization 1 ||u||1 = ||ui || min ||Kf y||2 + ||⇥f ||1 f 2 i min G1 (f ) + G2 (f ) x G1 (u) = ||u||1 Prox G1 (u)i = max 0, 1 ui ||ui || 1 G2 (f ) = ||Kf y||2 Prox = (Id + K K) 1 K 2 G2 C = (f, u) ⇥ RN RN 2 u = ⇤f ˜ ˜ Prox C (f, u) = (f , f ) Compute the solution of: (Id + ˜ )f = div(u) + f O(N log(N )) operations using FFT.
  • 159. Example: TV Regularization Orignal f0 y = f0 + w Recovery f y = Kx0 Iteration
  • 160. Conclusion Sparsity: approximate signals with few atoms. dictionary
  • 161. Conclusion Sparsity: approximate signals with few atoms. dictionary Compressed sensing ideas: Randomized sensors + sparse recovery. Number of measurements signal complexity. CS is about designing new hardware.
  • 162. Conclusion Sparsity: approximate signals with few atoms. dictionary Compressed sensing ideas: Randomized sensors + sparse recovery. Number of measurements signal complexity. CS is about designing new hardware. The devil is in the constants: Worse case analysis is problematic. Designing good signal models.
  • 163. RAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 IT CALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 EPRESENTATION FOR COLOR IMAGE RESTORATION DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR ESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 Some Hot Topics color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric). uced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary. bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when h is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm, Dictionary learning: dB. with 256 atoms learned on a generic database of natural images, with two different sizes ofREPRESENTATION FOR COLOR IMAGE RESTORATION MAIRAL et al.: SPARSE patches. Note the large number of color-less atoms. 57 ave negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches. R IMAGE RESTORATION 61 Fig. 7. Data set used for evaluating denoising experiments. learning ing Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one. TABLE I g. 7. Data set used for evaluating denoising experiments. with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms. Fig. 2. Dictionaries Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches. color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric). duced with our proposed technique ( TABLE I our proposed new metric). Both images have been denoised with the same global dictionary. in TH 256 ATOMS OF SIZE castle 7 in3 FOR of the water. What is more, the color of the sky is.piecewise CASE IS DIVIDED IN FOUR a bias effect in the color from the 7 and some part AND 6 6 3 FOR EACH constant when ch is another artifact our approach corrected. (a)HEIR “3(b) Original algorithm, HE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY Y MCAULEY AND AL [28] WITH T Original. 3 MODEL.” T dB. (c) Proposed algorithm, 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE O dB. 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS 2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINED AND 6 OTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS. H GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS 6 3 FOR Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric). Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary. In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when (false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm, dB. . EACH CASE IS DIVID
  • 164. RAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 IT CALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 EPRESENTATION FOR COLOR IMAGE RESTORATION DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR ESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 Some Hot Topics color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric). uced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary. bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when Image f = h is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm, dB. Dictionary learning: with 256 atoms learned on a generic database of natural images, with two different sizes ofREPRESENTATION FOR COLOR IMAGE RESTORATION ave negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 MAIRAL et al.: SPARSE patches. Note the large number of color-less 5 3 patches; (b) 8 8 atoms. 3 patches. 57 x R IMAGE RESTORATION 61 Fig. 7. Data set used for evaluating denoising experiments. learning ing Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one. TABLE I Analysis vs. synthesis: g. 7. Data set used for evaluating denoising experiments. with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms. Fig. 2. Dictionaries Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches. Js (f ) = min ||x||1 color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( TABLE I in the new metric). duced with our proposed technique ( a bias effect in the color from the 7 in our proposed new metric). Both images have been denoised with the same global dictionary. TH 256 ATOMS OF SIZE castle 7 in3 FOR of the water. What is more, the color of the sky is.piecewise CASE IS DIVIDED IN FOUR and some part AND 6 6 3 FOR EACH constant when f= x ch is another artifact our approach corrected. (a)HEIR “3(b) Original algorithm, HE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY Y MCAULEY AND AL [28] WITH T Original. 3 MODEL.” T dB. (c) Proposed algorithm, 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE O dB. 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS 2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINED AND 6 OTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS. H GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS Coe cients x 6 3 FOR Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric). Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary. In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when (false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm, dB. . EACH CASE IS DIVID