This document describes work on developing spectrum-based regularization approaches for linear inverse problems. The author proposes using a learned distribution of singular values to build regularization models that are better suited for recovering signals correlated with medium frequencies, not just low frequencies as in traditional models. Algorithms are presented for learning the singular value profile from training data and for solving the resulting regularization models. Experimental results demonstrate that the proposed spectrum-learning regularization and SLR-TV hybrid models can provide improved reconstruction accuracy over total variation and Tikhonov regularization.
1. A Spectrum-based Regularization
Approach to Linear Inverse Problems:
Models, Learned Parameters and
Algorithms
Jorge A. Castañón
Advisor: Dr. Yin Zhang
November 7, 2014
This work is partially funded by NSF Grants DMS-0811188 and DMS-1115950
and by CONACYT Grant 212611/307450.
1
2. Outline
• Linear Inverse Problems and Regularization
• Motivation
• Algorithms to Learn and Solve the Models
• Results
• Summary of Contributions
2
3. Medical Imaging
Image taken from www.siemens.co.uk
• Goal: minimize the time patients spend inside the
machine while getting a high quality image
3
10. Goal: estimate givenx⇤
10
A, b
where
x 2 Rn
, m n
Find a desirable x s.t. Ax = b
Linear Regularization
min R( x) s.t. Ax = b
11. Tikhonov (1963)
Goal: estimate givenx⇤
11
A, b
where
x 2 Rn
, m n
Find a desirable x s.t. Ax = b
Linear Regularization
min R( x) s.t. Ax = b
R( x) = || x||2
2: “smooth” x⇤
12. Goal: estimate givenx⇤
12
A, b
where
x 2 Rn
, m n
Find a desirable x s.t. Ax = b
Linear Regularization
min R( x) s.t. Ax = b
Tikhonov (1963)
R( x) = || x||2
2: “smooth” x⇤ R( x) = || x||1: sparse x⇤
e.g., Santosa-Symes (1986)
14. Motivation
• Improve TotalVariation (TV) and Tikhonov
models for some practical situations
• Approach: look at finite difference matrices
differently
14
15. Why Derivatives of Images? Sparse Gradients
• Let be a first order finite difference matrix= D1
D1 =
0
B
B
B
@
1 1 0 · · · 0 0
0 1 1 · · · 0 0
...
...
...
...
...
...
0 0 0 · · · 1 1
1
C
C
C
A
D1
15
16. TotalVariation
• Rudin, Osher and Fatemi (1992) proposed to use
TV for image denoising
• TV has been widely used for a variety of imaging
problems
16
17. TotalVariation
• A form of the discrete TV model:
!
!
• where
17
min TV(x) s.t. Ax = b
• TV(x) = ||D1x||1 (Anisotropic)
• D1 = 2D first order finite di↵erence
• x is the vectorized imaege
19. A Deeper Look into Derivatives
D1 = U⌃V T
n columns of V (DCT)
n−1columnsofU(DST)
Diagonal
19
.
.
.
.
.
.
columns of a
DCT matrix
columns of a
DST matrix
21. A Deeper Look into Derivatives
x,
n−1columnsofU(DST)
D1x = U⌃(V T
x)
Diagonal Inner Products with
cosines
Expanded in a basis
of sines
21
n columns of V (DCT)
22. The Derivative “likes” Low Frequencies
n columns of V (DCT)
5 10 15 20 25 30 35 40 45
10
−1
10
0
Σ : sigular values of D1
j
σj
22
Singular values
23. A New Idea…
23
Singular values
We can use a different
distribution of the singular
values to build a model
that is more adequate
to recover signals that
are not only correlated
with low frequencies
50 100 150 200 250 300 350 400 450 500
10
−4
10
−3
10
−2
10
−1
10
0
σj
j
Profiles
TV
Proposed
medium
frequencies
24. Model and Learning
24
50 100 150 200 250 300 350 400 450 500
10
−4
10
−3
10
−2
10
−1
10
0
σj
j
Profiles
TV
Proposed
medium
frequencies
min
x
||U ˆ⌃V T
x||q
q s.t. Ax = b
ˆ⌃
25. 1. How do we Estimate the Profile?
25
50 100 150 200 250 300 350 400 450 500
10
−4
10
−3
10
−2
10
−1
10
0
σj
j
Profiles
TV
Proposed
medium
frequencies
min
x
||U ˆ⌃V T
x||q
q s.t. Ax = b
ˆ⌃
26. 2. How do we Solve the Model?
26
50 100 150 200 250 300 350 400 450 500
10
−4
10
−3
10
−2
10
−1
10
0
σj
j
Profiles
TV
Proposed
medium
frequencies
min
x
||U ˆ⌃V T
x||q
q s.t. Ax = b
ˆ⌃
27. Methods!
1. How to estimate the profile
2. How to solve the regularization models
27
ˆ⌃
• Fast Multiplication
• Storage-Free
Both algorithms exploit
the following properties
of the matrices:
28. Learning Model
28
• Randomly pick some rows or
columns
{p1, p2, · · · pk} ⇢ Rn
• To avoid a trivial solution
ˆ⌃ = arg min
2⌦
kX
j=1
||U⌃V T
pj||q
q
⌦ = {v 2 Rn
|
P
vj = 1, v 0}
• where and= diag(⌃)
• Solve the learning model
q = 1, 2
29. Sparse Learning
29
• Suppose that makes sparse
⌦ = {v 2 Rn
|
P
vj = 1, v 0}• where and= diag(⌃)
• Solve the learning model
min
2⌦
kX
j=1
||U⌃V T
pj||1
!
• Can be solved as a Linear Program
!
ˆ⌃ U ˆ⌃V T
pj 8j
30. An Example!
• Given a set of piecewise constant training vectors
• Estimate the singular values of the first order
finite difference matrix:
!
30
⇤
j =
s
2 2 cos
✓
2⇡
n j
n 1
◆
0 10 20 30 40 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
j
σj
k = 1, RE = 2.05
0 10 20 30 40 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
j
σj
k = 2, RE = 1.38
0 10 20 30 40 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
j
σj
k = 3, RE = 1.47e-12
0 10 20 30 40 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
jσj
k = 4, RE = 9.77e-11
True
Estimated
True
Estimated
True
Estimated
True
Estimated
k=1 k=2 k=3 k=4
31. -Learning
31
⌦ = {v 2 Rn
|
P
vj = 1, v 0}• where and= diag(⌃)
• Solve the learning model:
!
• Quadratic objective with a linear constraint
!
min
2⌦
kX
j=1
||U⌃V T
pj||2
2
`2
• Suppose that makes “smooth”
ˆ⌃ U ˆ⌃V T
pj 8j
32. 32
!
• Note that
min
2⌦
kX
j=1
||U⌃V T
pj||2
2 () min
2⌦
T
S
!
• where
S = diag{(V T
p1).2
+ · · · + (V T
pk).2
}!
• Thus, the solution is given by
⇤
=
S 1
e
eT S 1e!
• where
eT
= (1, 1, · · · , 1) 2 Rn
-Learning`2
This is the
training method
used for our
proposed
models
33. 2D Problems
33
The 2D training reduces
to two independent
1D training problems!
• Signal is vectorized x⇤
2 RN
, where N = n2
• Sampling scheme A 2 RM⇥N
, where M N
• Regularization matrix is = U ˆ⌃V
1. V = (V ⌦ V ) 2 RN⇥N
and
2. ˆ⌃ =
✓
ˆ⌃row ⌦ In
In ⌦ ˆ⌃col
◆
2 R2N⇥N
34. Regularization Models
!
!
• Spectrum-Learning Regularization model (SLR)
!
34
min
x
1
2
|| ˆ⌃VT
x||2
2 s.t. Ax = b
!
!
• SLR-TV hybrid model
!
min
x
1
2
|| ˆ⌃VT
x||2
2 + ↵||D1x||1 s.t. Ax = b
• where ↵ > 0 and
• D1 2D finite di↵erence
35. SLR Model
35
!
• Given a trained profile
• Spectrum-Learning Regularization (SLR) model is
!
min
x
1
2
|| ˆ⌃VT
x||2
2 s.t. Ax = b
!
• By standard (KKT), find such that
(x⇤
, ⇤
)
x⇤
= V( ˆ⌃T ˆ⌃) 1
VT ⇤
AV( ˆ⌃T ˆ⌃) 1
(AV)T ⇤
= b
M ⇥ M symmetric positive definite
system of equations: CG
ˆ⌃
36. Accelerated CG
36
0 1000 2000 3000 4000 5000 6000 7000
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
Dimension of Linear System M
ConditionnumberofCγ
κ( ˆΣT ˆΣ)
κ(C0)
κ(C10−4)
κ(C10−2)
κ(C10−1)
• Let C = AV( ˆ⌃T ˆ⌃ ) 1
VT
where > 0 and
ˆ⌃ =
✓
(ˆ⌃row + In) ⌦ In
In ⌦ (ˆ⌃col + In)
◆
• The convergence rate of CG
for solving C = b, depends
on the condition number (C )
• The larger the > 0,
the smaller the (C )
37. Accelerated CG ~ 80% Faster
37
150
200
250
300
350
400
1 2
Regular CG Accelarated CG
CPUTime(s)
Algorithm:
Given an initial guess 0
and 1
· · · l
0
For k = 1 : l
1. k
2. k
CG (C , k 1
)
End
CG Accelerated
CG
38. Motivation of the SLR-TV Hybrid Model
38
Absolute Error Maps: |X X⇤
| 2 Rn⇥n
TV: 30.4 dB Tik: 27 dB SLR: 33.1 dB
0
10
20
30
40
50
The error is measured with Peak Signal to Noise Ratio
PSNR (dB): the higher, the better.
39. SLR-TV Hybrid Model
39
• Given by
min
x
1
2
|| ˆ⌃VT
x||2
2 + ↵||D1x||1 s.t. Ax = b,
where ↵ > 0 and D1 is a 2D first order finite di↵erence matrix
• Let y = D1x be a splitting variable, then
min
x
1
2
|| ˆ⌃VT
x||2
2 + ↵||y||1 s.t.
✓
A
D1
◆
x
✓
0
I
◆
y =
✓
b
0
◆
• Two separable convex blocks with linear constraints
• Suitable for Alternating Direction Method (ADM)
40. ADM
40
• Proposed by Glowinski and Marocco (1975) and Gabay and Mercier (1976)
to solve
min
x,y
f(x) + g(y) s.t. Dx + Ey = c
• Form the augmented Lagrangian function
LA(x, y, ) = f(x) + g(y) T
(Dx + Ey c) +
2
||Dx + Ey c||2
2,
where > 0
• Given x0
, y0
and 0
, the iteration is
1. xk+1
= arg minx L(x, yk
, k
)
2. yk+1
= arg miny L(xk+1
, y, k
)
3. k+1
= k
⇢(Dxk+1
+ Eyk+1
c),
where ⇢ 2 (0, (1 +
p
5)/2).
41. ADM for SLR-TV
41
• The ADM applied to the SLR-TV is given by
1. f(x) = 1
2 || ˆ⌃VT
x||2
2 and g(y) = ↵||y||1
2. D =
✓
A
D1
◆
, E =
✓
0
I
◆
and c =
✓
b
0
◆
• The x-subproblem is equivalent to solving an (N ⇥ N) linear system
1. Sherman-Morrison to reduce the dimension of the system to solve
from N to M (M N)
2. CG to solve the (M ⇥ M) system
• The y-subproblem is solved exactly (Shrinkage formula)
42. Parameter-Free ADM
42
• Objective function of the SLR-TV
1
2
|| ˆ⌃VT
x||2
2 + ↵||D1x||1,
• Balancing parameter ↵ > 0 is hard to choose
• Provide a diagonal matrix W to replace ↵ > 0
1
2
||W ˆ⌃VT
x||2
2 + ||D1x||1,
• Matrix of weights is
W =
(ˆ⌃
1/2
row ⌦ In) 0N⇥N
0N⇥N (In ⌦ ˆ⌃
1/2
col )
!
43. Results
!
• Visualization of the 2D profiles
• Learning does not need a lot of prior information
(we saw earlier that it is cheap!)
• The proposed models enhance the quality of the
recovered images
!
!
43
44. Interpretation of the 2D Profiles
44
Low
Frequencies
High
Frequencies
1. ˆ⌃ =
✓
ˆ⌃row ⌦ In
In ⌦ ˆ⌃col
◆
2 R2N⇥N
2. diag ˆ⌃T ˆ⌃ 2 RN
3. Reshape (diag ˆ⌃T ˆ⌃) 2 Rn⇥n
47. How Many TrainingVectors? Just a Few…
47
0 20 40 60 80 100 120 140 160 180 200
21
22
23
24
25
26
27
28
29
Training Data Size: k
PSNR(dB)
Size of Training Set vs. Quality
heart
shoulder
brain
thorax
• Training data size
versus quality
• Quality does not
improve
significantly when
increasing the size
of the training set
Training Data: randomly chose k rows and k columns from
each of the image that are not being recovered.
48. Quality Enhancement
48
Original: 1024 × 1024
Sampling @ 10%
knee4
TV
PSNR = 33.27dB
RelErr = 8.46%
CPU = 100.63s
Tikhonov
PSNR = 33.60dB
RelErr = 8.15%
CPU = 54.93s
SLR
PSNR = 43.36dB
RelErr = 2.65%
CPU = 151.21s
• The quality of SLR
is 10 dB higher than
TV and Tikhonov
• SLR recovers better
bone structures
(zoom in next slide)
49. Quality Enhancement
49
Original: 1024 × 1024
Sampling @ 10%
knee4
TV
PSNR = 33.27dB
RelErr = 8.46%
CPU = 100.63s
Tikhonov
PSNR = 33.60dB
RelErr = 8.15%
CPU = 54.93s
SLR
PSNR = 43.36dB
RelErr = 2.65%
CPU = 151.21s
Original: 1024 × 1024
Sampling @ 10%
knee7
TV
PSNR = 28.88dB
RelErr = 8.53%
CPU = 101.28s
Tikhonov
PSNR = 26.31dB
RelErr = 11.47%
CPU = 54.72s
SLR
PSNR = 28.94dB
RelErr = 8.48%
CPU = 153.67s
54. TV versus SLR
54
Piecewise constant signals
can be recovered exactly
with TV model*
Other more complex
signals can be recovered
with higher accuracy
using SLR model
*Candes,Tao and Romberg,“Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information”, 2006
55. Summary of Major Contributions
• Proposed Spectrum-Learning Regularization (SLR)
and hybrid models that only require a few parameters
to learn
• Developed computationally inexpensive training
strategies to estimate the profile of a signal of interest
• Designed convergent algorithms to solve the
proposed regularization models
• The quality of the recovered images by SLR and SLR-
TV is considerably enhanced
55
56. Remarks
• SLR improves the accuracy of the recovery in
scenarios where compressive sensing theory does
not hold
• SLR methods do not rely on the choice of the
sampling matrix
• DCT basis was used for SLR; nonetheless, a different
choice of basis may be more adequate for other
applications
• A pre-conditioner for CG could potentially improve
the performance of the proposed algorithms
56