Defense_Talk

A Spectrum-based Regularization
Approach to Linear Inverse Problems:
Models, Learned Parameters and
Algorithms
Jorge A. Castañón

Advisor: Dr. Yin Zhang

November 7, 2014
This work is partially funded by NSF Grants DMS-0811188 and DMS-1115950
and by CONACYT Grant 212611/307450.
1

Outline
• Linear Inverse Problems and Regularization

• Motivation

• Algorithms to Learn and Solve the Models

• Results

• Summary of Contributions
2

Medical Imaging
Image taken from www.siemens.co.uk
• Goal: minimize the time patients spend inside the
machine while getting a high quality image
3

Data Acquisition Processes
Object of Interest

!
b = Ax⇤
+ !
4

Sampling Scheme

Object of Interest
b = Ax⇤
+ !
5

Measured Data
Sampling Scheme
Object of Interest
b = Ax⇤
+ !
6

Measured Data
Sampling Scheme
Object of Interest
b = Ax⇤
+ !
Noise
7

Measured Data
Sampling Scheme
Object of Interest
b = Ax⇤
+ !
Noise
Underdetermined System
A 2 Rm⇥n
, m  n
8

where
Goal: estimate givenx⇤
x 2 Rn
, m  n
9
A, b
Find a desirable x s.t. Ax = b

10
A, b
where
x 2 Rn
, m  n
Linear Regularization
min R( x) s.t. Ax = b

Tikhonov (1963)
11
A, b
where
x 2 Rn
, m  n
R( x) = || x||2
2: “smooth” x⇤

12
A, b
where
x 2 Rn
, m  n
Tikhonov (1963)
R( x) = || x||2
2: “smooth” x⇤ R( x) = || x||1: sparse x⇤
e.g., Santosa-Symes (1986)

Minimum 1-norm and 2-norm solutions
`1 vs `2
13
||x||1
||x||2
Ax = b

Motivation
• Improve TotalVariation (TV) and Tikhonov
models for some practical situations

• Approach: look at ﬁnite difference matrices
differently
14

Why Derivatives of Images? Sparse Gradients
• Let be a ﬁrst order ﬁnite difference matrix= D1
D1 =
0
B
B
B
@
1 1 0 · · · 0 0
0 1 1 · · · 0 0
...
...
...
...
...
...
0 0 0 · · · 1 1
1
C
C
C
A
D1
15

TotalVariation
• Rudin, Osher and Fatemi (1992) proposed to use
TV for image denoising

• TV has been widely used for a variety of imaging
problems
16

TotalVariation
• A form of the discrete TV model:

!
!
• where
17
min TV(x) s.t. Ax = b
• TV(x) = ||D1x||1 (Anisotropic)
• D1 = 2D ﬁrst order ﬁnite di↵erence
• x is the vectorized imaege

SingularValue Decomposition
D1 = U⌃V T
18
where
• U and V are unitary
• ⌃ diagonal

A Deeper Look into Derivatives
D1 = U⌃V T
n columns of V (DCT)
n−1columnsofU(DST)
Diagonal
19
.

.

.
.

.

.
columns of a

DCT matrix
columns of a

DST matrix

D1x = U⌃(V T
x)
20

x,
n−1columnsofU(DST)
D1x = U⌃(V T
x)
Diagonal Inner Products with
cosines
Expanded in a basis
of sines
21

The Derivative “likes” Low Frequencies
5 10 15 20 25 30 35 40 45
10
−1
10
0
Σ : sigular values of D1
j
σj
22
Singular values

A New Idea…
23
Singular values
We can use a different
distribution of the singular
values to build a model
that is more adequate
to recover signals that
are not only correlated
with low frequencies
50 100 150 200 250 300 350 400 450 500
10
−4
10
−3
10
−2
10
−1
10
0
σj
j
Proﬁles
TV
Proposed
medium
frequencies

Model and Learning
24
50 100 150 200 250 300 350 400 450 500
10
−4
10
−3
10
−2
10
−1
10
0
σj
j
Proﬁles
TV
Proposed
medium
frequencies
min
x
||U ˆ⌃V T
x||q
q s.t. Ax = b
ˆ⌃

1. How do we Estimate the Proﬁle?
25
50 100 150 200 250 300 350 400 450 500
10
−4
10
−3
10
−2
10
−1
10
0
σj
j
Proﬁles
TV
Proposed
medium
frequencies
min
x
||U ˆ⌃V T
x||q
q s.t. Ax = b
ˆ⌃

2. How do we Solve the Model?
26
50 100 150 200 250 300 350 400 450 500
10
−4
10
−3
10
−2
10
−1
10
0
σj
j
Proﬁles
TV
Proposed
medium
frequencies
min
x
||U ˆ⌃V T
x||q
q s.t. Ax = b
ˆ⌃

Methods!
1. How to estimate the proﬁle

2. How to solve the regularization models

27
ˆ⌃
• Fast Multiplication

• Storage-Free
Both algorithms exploit
the following properties
of the matrices:

Learning Model
28
• Randomly pick some rows or
columns

{p1, p2, · · · pk} ⇢ Rn
• To avoid a trivial solution
ˆ⌃ = arg min
2⌦
kX
j=1
||U⌃V T
pj||q
q
⌦ = {v 2 Rn
|
P
vj = 1, v 0}
• where and= diag(⌃)
• Solve the learning model
q = 1, 2

Sparse Learning
29
• Suppose that makes sparse

⌦ = {v 2 Rn
|
P
vj = 1, v 0}• where and= diag(⌃)
• Solve the learning model
min
2⌦
kX
j=1
||U⌃V T
pj||1
!
• Can be solved as a Linear Program

!
ˆ⌃ U ˆ⌃V T
pj 8j

An Example!
• Given a set of piecewise constant training vectors

• Estimate the singular values of the ﬁrst order
ﬁnite difference matrix:

!
30
⇤
j =
s
2 2 cos
✓
2⇡
n j
n 1
◆
0 10 20 30 40 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
j
σj
k = 1, RE = 2.05
0 10 20 30 40 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
j
σj
k = 2, RE = 1.38
0 10 20 30 40 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
j
σj
k = 3, RE = 1.47e-12
0 10 20 30 40 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
jσj
k = 4, RE = 9.77e-11
True
Estimated
True
Estimated
True
Estimated
True
Estimated
k=1 k=2 k=3 k=4

-Learning
31
⌦ = {v 2 Rn
|
P
vj = 1, v 0}• where and= diag(⌃)
• Solve the learning model:
!
• Quadratic objective with a linear constraint

!
min
2⌦
kX
j=1
||U⌃V T
pj||2
2
`2
• Suppose that makes “smooth”

ˆ⌃ U ˆ⌃V T
pj 8j

32
!
• Note that

min
2⌦
kX
j=1
||U⌃V T
pj||2
2 () min
2⌦
T
S
!
• where

S = diag{(V T
p1).2
+ · · · + (V T
pk).2
}!
• Thus, the solution is given by

⇤
=
S 1
e
eT S 1e!
• where

eT
= (1, 1, · · · , 1) 2 Rn
-Learning`2
This is the
training method
used for our
proposed
models

2D Problems
33
The 2D training reduces

to two independent

1D training problems!
• Signal is vectorized x⇤
2 RN
, where N = n2
• Sampling scheme A 2 RM⇥N
, where M  N
• Regularization matrix is = U ˆ⌃V
1. V = (V ⌦ V ) 2 RN⇥N
and
2. ˆ⌃ =
✓
ˆ⌃row ⌦ In
In ⌦ ˆ⌃col
◆
2 R2N⇥N

Regularization Models
!
!
• Spectrum-Learning Regularization model (SLR)

!
34
min
x
1
2
|| ˆ⌃VT
x||2
2 s.t. Ax = b
!
!
• SLR-TV hybrid model

!
min
x
1
2
|| ˆ⌃VT
x||2
2 + ↵||D1x||1 s.t. Ax = b
• where ↵ > 0 and
• D1 2D ﬁnite di↵erence

SLR Model
35
!
• Given a trained profile

• Spectrum-Learning Regularization (SLR) model is

!
min
x
1
2
|| ˆ⌃VT
x||2
2 s.t. Ax = b
!
• By standard (KKT), find such that

(x⇤
, ⇤
)
x⇤
= V( ˆ⌃T ˆ⌃) 1
VT ⇤
AV( ˆ⌃T ˆ⌃) 1
(AV)T ⇤
= b
M ⇥ M symmetric positive definite
system of equations: CG
ˆ⌃

Accelerated CG
36
0 1000 2000 3000 4000 5000 6000 7000
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
Dimension of Linear System M
ConditionnumberofCγ
κ( ˆΣT ˆΣ)
κ(C0)
κ(C10−4)
κ(C10−2)
κ(C10−1)
• Let C = AV( ˆ⌃T ˆ⌃ ) 1
VT
where > 0 and
ˆ⌃ =
✓
(ˆ⌃row + In) ⌦ In
In ⌦ (ˆ⌃col + In)
◆
• The convergence rate of CG
for solving C = b, depends
on the condition number (C )
• The larger the > 0,
the smaller the (C )

Accelerated CG ~ 80% Faster
37
150
200
250
300
350
400
1 2
Regular CG Accelarated CG
CPUTime(s)
Algorithm:
Given an initial guess 0
and 1
· · · l
0
For k = 1 : l
1. k
2. k
CG (C , k 1
)
End
CG Accelerated

CG

Motivation of the SLR-TV Hybrid Model
38
Absolute Error Maps: |X X⇤
| 2 Rn⇥n
TV: 30.4 dB Tik: 27 dB SLR: 33.1 dB
0
10
20
30
40
50
The error is measured with Peak Signal to Noise Ratio
PSNR (dB): the higher, the better.

SLR-TV Hybrid Model
39
• Given by
min
x
1
2
|| ˆ⌃VT
x||2
2 + ↵||D1x||1 s.t. Ax = b,
where ↵ > 0 and D1 is a 2D ﬁrst order ﬁnite di↵erence matrix
• Let y = D1x be a splitting variable, then
min
x
1
2
|| ˆ⌃VT
x||2
2 + ↵||y||1 s.t.
✓
A
D1
◆
x
✓
0
I
◆
y =
✓
b
0
◆
• Two separable convex blocks with linear constraints
• Suitable for Alternating Direction Method (ADM)

ADM
40
• Proposed by Glowinski and Marocco (1975) and Gabay and Mercier (1976)
to solve
min
x,y
f(x) + g(y) s.t. Dx + Ey = c
• Form the augmented Lagrangian function
LA(x, y, ) = f(x) + g(y) T
(Dx + Ey c) +
2
||Dx + Ey c||2
2,
where > 0
• Given x0
, y0
and 0
, the iteration is
1. xk+1
= arg minx L(x, yk
, k
)
2. yk+1
= arg miny L(xk+1
, y, k
)
3. k+1
= k
⇢(Dxk+1
+ Eyk+1
c),
where ⇢ 2 (0, (1 +
p
5)/2).

ADM for SLR-TV
41
• The ADM applied to the SLR-TV is given by
1. f(x) = 1
2 || ˆ⌃VT
x||2
2 and g(y) = ↵||y||1
2. D =
✓
A
D1
◆
, E =
✓
0
I
◆
and c =
✓
b
0
◆
• The x-subproblem is equivalent to solving an (N ⇥ N) linear system
1. Sherman-Morrison to reduce the dimension of the system to solve
from N to M (M  N)
2. CG to solve the (M ⇥ M) system
• The y-subproblem is solved exactly (Shrinkage formula)

Parameter-Free ADM
42
• Objective function of the SLR-TV
1
2
|| ˆ⌃VT
x||2
2 + ↵||D1x||1,
• Balancing parameter ↵ > 0 is hard to choose
• Provide a diagonal matrix W to replace ↵ > 0
1
2
||W ˆ⌃VT
x||2
2 + ||D1x||1,
• Matrix of weights is
W =
(ˆ⌃
1/2
row ⌦ In) 0N⇥N
0N⇥N (In ⌦ ˆ⌃
1/2
col )
!

Results
!
• Visualization of the 2D proﬁles

• Learning does not need a lot of prior information
(we saw earlier that it is cheap!)

• The proposed models enhance the quality of the
recovered images

!
!
43

Interpretation of the 2D Proﬁles
44
Low

Frequencies
High

Frequencies
1. ˆ⌃ =
✓
ˆ⌃row ⌦ In
In ⌦ ˆ⌃col
◆
2 R2N⇥N
2. diag ˆ⌃T ˆ⌃ 2 RN
3. Reshape (diag ˆ⌃T ˆ⌃) 2 Rn⇥n

2D Finite Difference and Learned Proﬁles
45
heart1 heart2
heart3 heart4
shoulder1 shoulder2
shoulder3 shoulder4
thorax1 thorax2 thorax3
thorax4 thorax5 thorax6
thorax7
brain1 brain2 brain3
brain4 brain5 brain6
brain7

Compressive Sensing
!
Randomized

Walsh-Hadamard

Matrices:
Medical images from
b = Ax⇤
+ !
Gaussian noise

46
Sampling 5%,10%,15%,20%
siemens.healthcare.com

and

radiopaedia.org

How Many TrainingVectors? Just a Few…
47
0 20 40 60 80 100 120 140 160 180 200
21
22
23
24
25
26
27
28
29
Training Data Size: k
PSNR(dB)
Size of Training Set vs. Quality
heart
shoulder
brain
thorax
• Training data size
versus quality

• Quality does not
improve
signiﬁcantly when
increasing the size
of the training set
Training Data: randomly chose k rows and k columns from
each of the image that are not being recovered.

Quality Enhancement
48
Original: 1024 × 1024
Sampling @ 10%
knee4
TV
PSNR = 33.27dB
RelErr = 8.46%
CPU = 100.63s
Tikhonov
PSNR = 33.60dB
RelErr = 8.15%
CPU = 54.93s
SLR
PSNR = 43.36dB
RelErr = 2.65%
CPU = 151.21s
• The quality of SLR
is 10 dB higher than
TV and Tikhonov

• SLR recovers better
bone structures
(zoom in next slide)

Quality Enhancement
49
Original: 1024 × 1024
Sampling @ 10%
knee4
TV
PSNR = 33.27dB
RelErr = 8.46%
CPU = 100.63s
Tikhonov
PSNR = 33.60dB
RelErr = 8.15%
CPU = 54.93s
SLR
PSNR = 43.36dB
RelErr = 2.65%
CPU = 151.21s
Original: 1024 × 1024
Sampling @ 10%
knee7
TV
PSNR = 28.88dB
RelErr = 8.53%
CPU = 101.28s
Tikhonov
PSNR = 26.31dB
RelErr = 11.47%
CPU = 54.72s
SLR
PSNR = 28.94dB
RelErr = 8.48%
CPU = 153.67s

Quality for Each Group
50
0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
22
24
26
28
30
32
34
36
38
40
42
Sampling Ratio
AveragePSNR(dB)
Heart images
0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
20
25
30
35
Sampling Ratio
AveragePSNR(dB)
Shoulder images
0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
18
20
22
24
26
28
30
Sampling Ratio
AveragePSNR(dB)
Brain images
0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
18
20
22
24
26
28
30
32
Sampling Ratio
AveragePSNR(dB)
Thorax images
TV
Tik
SLR
Hybrid

Average CPU Time
51
0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
20
40
60
80
100
120
140
160
Sampling Ratio
CPUtime(s)
Siemens images
TV
Tik
SLR
Hybrid

When Adding Noise
52
0 0.05 0.1 0.15
20
25
30
35
40
Noise Level (%)
AveragePSNR(dB)
Heart images
TV
Tik
SLR
Hybrid
0 0.05 0.1 0.15
14
16
18
20
22
24
26
28
30
32
Noise Level (%)
AveragePSNR(dB)
Shoulder images
0 0.05 0.1 0.15
14
16
18
20
22
24
26
Noise Level (%)
AveragePSNR(dB)
Brain images
0 0.05 0.1 0.15
14
16
18
20
22
24
26
28
Noise Level (%)
AveragePSNR(dB)
Thorax images

Absolute Error Maps
53
Absolute Errors
0
20
40
60
80
100
120
140
TV: 22.03 dB Tik: 31.22 dB
SLR: 30.99 dB HYB: 46.65dB

TV versus SLR
54
Piecewise constant signals

can be recovered exactly

with TV model*
Other more complex

signals can be recovered

with higher accuracy

using SLR model
*Candes,Tao and Romberg,“Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information”, 2006

Summary of Major Contributions
• Proposed Spectrum-Learning Regularization (SLR)
and hybrid models that only require a few parameters
to learn

• Developed computationally inexpensive training
strategies to estimate the proﬁle of a signal of interest

• Designed convergent algorithms to solve the
proposed regularization models

• The quality of the recovered images by SLR and SLR-
TV is considerably enhanced

55

Remarks
• SLR improves the accuracy of the recovery in
scenarios where compressive sensing theory does
not hold

• SLR methods do not rely on the choice of the
sampling matrix

• DCT basis was used for SLR; nonetheless, a different
choice of basis may be more adequate for other
applications

• A pre-conditioner for CG could potentially improve
the performance of the proposed algorithms
56

Defense_Talk

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Defense_Talk

Similar to Defense_Talk (20)

Defense_Talk