Model Selection with Piecewise Regular Gauges

Gabriel Peyré
www.numerical-tours.com
Model Selection
with Piecewise
Regular Gauges
Samuel Vaiter
Charles Deledalle
Jalal Fadili
Joint work with:
Joseph Salmon
VISI N

Overview
• Inverse Problems
• Gauge Decomposition and Model Selection
• L2 Stability Performances
• Model Stability Performances

y = x0 + w 2 RP
Inverse Problems
Recovering x0 RN
from noisy observations

Examples: Inpainting, super-resolution, compressed-sensing
y = x0 + w 2 RP
Inverse Problems
Recovering x0 RN
from noisy observations
x0
x0

Regularized inversion:
Estimators
x(y) 2 argmin
x2RN
1
2
||y x||2
+ J(x)
Data ﬁdelity Regularity
Observations: y = x0 + w 2 RP
.

L2
error stability: ||x(y) x0|| = O(||w||).
Promoted subspace (“model”) stability.
Goal: Performance analysis:
Regularized inversion:
Estimators
x(y) 2 argmin
x2RN
1
2
||y x||2
+ J(x)
! Criteria on (x0, ||w||, ) to ensure
Data ﬁdelity Regularity
Observations: y = x0 + w 2 RP
.

Coe cients x Image x
Union of Linear Models for Data Processing
Union of models: T 2 T linear spaces.
Synthesis
sparsity:
T

Synthesis
sparsity:
T
Structured
sparsity:

D
Image x Gradient D⇤
x
Synthesis
sparsity:
T
Structured
sparsity:
Analysis
sparsity:

Multi-spectral imaging:
xi,· =
Pr
j=1 Ai,jSj,·
D
Image x Gradient D⇤
x
Synthesis
sparsity:
T
Structured
sparsity:
Analysis
sparsity:
Low-rank:
S1,·
S2,·
S3,·x

Gauge: J : RN
! R+
8 ↵ 2 R+
, J(↵x) = ↵J(x)
Gauges for Union of Linear Models
Convex

Gauge: J : RN
! R+
8 ↵ 2 R+
, J(↵x) = ↵J(x)
J(x) = C(x) = inf {⇢ > 0 x 2 ⇢C}
C = {x J(x) 6 1} (assuming 0 2 C)
J(x)
C
1
Convex

Gauge:
, Union of linear models (T)T 2TPiecewise regular ball
J : RN
! R+
8 ↵ 2 R+
, J(↵x) = ↵J(x)
J(x) = C(x) = inf {⇢ > 0 x 2 ⇢C}
C = {x J(x) 6 1} (assuming 0 2 C)
J(x)
C
1
Convex

x
T
Gauge:
J : RN
! R+
8 ↵ 2 R+
, J(↵x) = ↵J(x)
J(x) = C(x) = inf {⇢ > 0 x 2 ⇢C}
C = {x J(x) 6 1} (assuming 0 2 C)
J(x) = ||x||1
T = sparse
vectors
J(x)
C
1
Convex

x
T
Gauge:
J : RN
! R+
8 ↵ 2 R+
, J(↵x) = ↵J(x)
J(x) = C(x) = inf {⇢ > 0 x 2 ⇢C}
C = {x J(x) 6 1} (assuming 0 2 C)
J(x) = ||x||1
x0
T0
T = sparse
vectors
J(x)
C
1
Convex

x
T
Gauge:
J : RN
! R+
8 ↵ 2 R+
, J(↵x) = ↵J(x)
J(x) = C(x) = inf {⇢ > 0 x 2 ⇢C}
C = {x J(x) 6 1} (assuming 0 2 C)
J(x) = ||x||1
x0
T0
T = sparse
vectors
|x1|+||x2,3||
x0
xT
T0
T = block
vectors
sparse
J(x)
C
1
Convex

x
T
Gauge:
J : RN
! R+
8 ↵ 2 R+
, J(↵x) = ↵J(x)
J(x) = C(x) = inf {⇢ > 0 x 2 ⇢C}
C = {x J(x) 6 1} (assuming 0 2 C)
J(x) = ||x||1
T = low-rank
matrices
J(x) = ||x||⇤
x
x0
T0
T = sparse
vectors
|x1|+||x2,3||
x0
xT
T0
T = block
vectors
sparse
J(x)
C
1
Convex

x
T
Gauge:
J : RN
! R+
8 ↵ 2 R+
, J(↵x) = ↵J(x)
J(x) = C(x) = inf {⇢ > 0 x 2 ⇢C}
C = {x J(x) 6 1} (assuming 0 2 C)
J(x) = ||x||1
T = low-rank
matrices
J(x) = ||x||⇤
x
x0
T0
T = anti-
sparse
vectors
J(x) = ||x||1
x
x0
T0
T = sparse
vectors
|x1|+||x2,3||
x0
xT
T0
T = block
vectors
sparse
J(x)
C
1
Convex

Subdifferentials and Models
@J(x) = ⌘ 2 RN
8 y, J(y) > J(x) + h⌘, y xi

I = supp(x) = {i xi 6= 0}
Example: J(x) = ||x||1 @||x||1 =
⇢
⌘
supp(⌘) = I,
8 j /2 I, |⌘j| 6 1
x
@J(x)
0
@J(x) = ⌘ 2 RN
8 y, J(y) > J(x) + h⌘, y xi

x@J(x)
0
I = supp(x) = {i xi 6= 0}
Example: J(x) = ||x||1 @||x||1 =
⇢
⌘
supp(⌘) = I,
8 j /2 I, |⌘j| 6 1
x
@J(x)
0
@J(x) = ⌘ 2 RN
8 y, J(y) > J(x) + h⌘, y xi

Tx
x@J(x)
0
Deﬁnition:
I = supp(x) = {i xi 6= 0}
Tx = VectHull(@J(x))?
Tx = {⌘ supp(⌘) = I}
Example: J(x) = ||x||1 @||x||1 =
⇢
⌘
supp(⌘) = I,
8 j /2 I, |⌘j| 6 1
Tx
x
@J(x)
0
@J(x) = ⌘ 2 RN
8 y, J(y) > J(x) + h⌘, y xi

Tx
x@J(x)
0
Deﬁnition:
I = supp(x) = {i xi 6= 0}
Tx = VectHull(@J(x))?
ex
ex = ProjTx
(@J(x))
ex = sign(x)
Tx = {⌘ supp(⌘) = I}
Example: J(x) = ||x||1 @||x||1 =
⇢
⌘
supp(⌘) = I,
8 j /2 I, |⌘j| 6 1
ex
Tx
x
@J(x)
0
@J(x) = ⌘ 2 RN
8 y, J(y) > J(x) + h⌘, y xi

Examples
`1
sparsity: J(x) = ||x||1
ex = sign(x) Tx = {z supp(z) ⇢ supp(x)}
x0
x @J(x)

Examples
x0
x
@J(x)
`1
ex = (N(xb))b2B
N(a) = a/||a||Structured sparsity: J(x) =
P
b ||xb||
Tx = {z supp(z) ⇢ supp(x)}
x0
x @J(x)

Examples
x0
x
@J(x)
Tx = {z U⇤
?zV? = 0}ex = UV ⇤
Nuclear norm: J(x) = ||x||⇤ x = U⇤V ⇤
SVD:
`1
ex = (N(xb))b2B
P
b ||xb||
x
@J(x)
x0
x @J(x)

Examples
x0
x
@J(x)
x x0
@J(x)
I = {i |xi| = ||x||1}Anti-sparsity: J(x) = ||x||1
Tx = {y yI / sign(xI)}ex = |I| 1
sign(x)
Tx = {z U⇤
?zV? = 0}ex = UV ⇤
Nuclear norm: J(x) = ||x||⇤ x = U⇤V ⇤
SVD:
`1
ex = (N(xb))b2B
P
b ||xb||
x
@J(x)
x0
x @J(x)

Noiseless recovery: min
x= x0
J(x) (P0)
x = x0
Dual Certificate and L2 Stability
x?

x= x0
J(x) (P0)
Dual certiﬁcates:
x = x0
⌘
Proposition:
D = Im( ⇤
) @J(x0)
9 ⌘ 2 D () x0 solution of (P0)
@J(x0)
x?

x= x0
J(x) (P0)
Dual certiﬁcates:
Tight dual certiﬁcates:
x = x0
⌘
Proposition:
D = Im( ⇤
) @J(x0)
¯D = Im( ⇤
) ri(@J(x0))
@J(x0)
x?

x= x0
J(x) (P0)
Dual certiﬁcates:
x = x0
⌘
Proposition:
D = Im( ⇤
) @J(x0)
¯D = Im( ⇤
) ri(@J(x0))
@J(x0)
x?
Theorem:
[Fadili et al. 2013] for ⇠ ||w|| one has ||x?
x0|| = O(||w||)
If 9 ⌘ 2 ¯D and ker( ) Tx0 = {0}

x= x0
J(x) (P0)
Dual certiﬁcates:
x = x0
⌘
Proposition:
! The constants depend on N . . .
D = Im( ⇤
) @J(x0)
¯D = Im( ⇤
) ri(@J(x0))
@J(x0)
x?
Theorem:
x0|| = O(||w||)
If 9 ⌘ 2 ¯D and ker( ) Tx0 = {0}

x= x0
J(x) (P0)
Dual certiﬁcates:
x = x0
⌘
Proposition:
[Grassmair 2012]: J(x?
x0) = O(||w||).
[Grassmair, Haltmeier, Scherzer 2010]: J = || · ||1.
! The constants depend on N . . .
D = Im( ⇤
) @J(x0)
¯D = Im( ⇤
) ri(@J(x0))
@J(x0)
x?
Theorem:
x0|| = O(||w||)
If 9 ⌘ 2 ¯D and ker( ) Tx0 = {0}

⌘ 2 D () and J (⌘) = 1
Minimal-norm Certificate
⌘ = ⇤
q
⌘T = e
⇢
T = Tx0
e = ex0

⌘ 2 D ()
We assume ker( ) T = {0} and J piecewise regular.
and J (⌘) = 1
⌘ = ⇤
q
⌘T = e
⇢
T = Tx0
e = ex0

⌘0 = argmin
⌘= ⇤q,⌘T =e
||q||
⌘ 2 D ()
and J (⌘) = 1
⌘ = ⇤
q
⌘T = e
Minimal-norm pre-certiﬁcate:
⇢
T = Tx0
e = ex0

⌘0 = argmin
⌘= ⇤q,⌘T =e
||q||
⌘ 2 D ()
and J (⌘) = 1
Proposition: One has
⌘ = ⇤
q
⌘T = e
⇢
T = Tx0
e = ex0
⌘0 = ( +
T )⇤
e

⌘0 = argmin
⌘= ⇤q,⌘T =e
||q||
⌘ 2 D ()
and J (⌘) = 1
Proposition:
||w|| = O(⌫x0 ) and ⇠ ||w||,Theorem:
the unique solution x?
of P (y) for y = x0 + w satisﬁes
Tx? = Tx0
and ||x?
x0|| = O(||w||) [Vaiter et al. 2013]
One has
⌘ = ⇤
q
⌘T = e
⇢
T = Tx0
e = ex0
⌘0 = ( +
T )⇤
e
If ⌘0 2 ¯D,

[Fuchs 2004]: J = || · ||1.
[Bach 2008]: J = || · ||1,2 and J = || · ||⇤.
[Vaiter et al. 2011]: J = ||D⇤
· ||1.
⌘0 = argmin
⌘= ⇤q,⌘T =e
||q||
⌘ 2 D ()
and J (⌘) = 1
Proposition:
||w|| = O(⌫x0 ) and ⇠ ||w||,Theorem:
the unique solution x?
of P (y) for y = x0 + w satisﬁes
Tx? = Tx0
and ||x?
x0|| = O(||w||) [Vaiter et al. 2013]
One has
⌘ = ⇤
q
⌘T = e
⇢
T = Tx0
e = ex0
⌘0 = ( +
T )⇤
e
If ⌘0 2 ¯D,

⇥x =
i
xi (· i)
Increasing :
reduces correlation.
reduces resolution.
J(x) = ||x||1
Example: 1-D Sparse Deconvolution
x0
x0

⇥x =
i
xi (· i)
Increasing :
reduces correlation.
reduces resolution.
0 10
2
support recovery.
J(x) = ||x||1
()
||⌘0,Ic ||1 < 1
⌘0 2 ¯D(x0)
I = {j x0(j) 6= 0}
||⌘0,Ic ||1
Example: 1-D Sparse Deconvolution
x0
x0
20
1
()

J(x) = ||rx||1 (rx)i = xi xi 1
= Id I = {i (rx0)i 6= 0}
8 j /2 I, ( ↵0)j = 0⌘0 = div(↵0) where
Example: 1-D TV Denoising
x0

+1
1
I
J
Support stability.
J(x) = ||rx||1 (rx)i = xi xi 1
= Id I = {i (rx0)i 6= 0}
8 j /2 I, ( ↵0)j = 0⌘0 = div(↵0) where
||↵0,Ic || < 1
x0
x0

+1
1
I
J
`2
stability onlySupport stability.
x0
J(x) = ||rx||1 (rx)i = xi xi 1
= Id I = {i (rx0)i 6= 0}
8 j /2 I, ( ↵0)j = 0⌘0 = div(↵0) where
||↵0,Ic || < 1 ||↵0,Ic || = 1
+1
1
J
x0
x0

Gauges: encode linear models as singular points.
Conclusion

Piecewise smooth gauges: enable model recovery Tx? = Tx0 .
Tight dual certiﬁcates: enables L2
stability.
Conclusion

Piecewise smooth gauges: enable model recovery Tx? = Tx0 .
– Approximate model recovery Tx? ⇡ Tx0 .
– Inﬁnite dimensional problems (measures, TV, etc.).
Tight dual certiﬁcates: enables L2
stability.
Conclusion
Open problems:

Model Selection with Piecewise Regular Gauges

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Model Selection with Piecewise Regular Gauges

Similar to Model Selection with Piecewise Regular Gauges (20)

More from Gabriel Peyré

More from Gabriel Peyré (18)

Recently uploaded

Recently uploaded (20)

Model Selection with Piecewise Regular Gauges