SlideShare a Scribd company logo
Decomposition and Denoising for moment sequences
using convex optimization
Badri Narayan Bhaskar
High Dimensional Inference
n p
Statistics
Microarray Data
Hyperspectral
Imaging
Predict Ratings
Sparse Vector Estimation
Gene Expressions
{
{X
n
p
k sparse
?
Samples
Genes
Few Active Genes
Disease Susceptibility =
Response
Low Rank Matrices
The Netflix Problem
Movie Ratings
{
{
thousands of
movies
millions of
subscribers
movie profile
user profile
Action
+ a handful of other features...
Decomposition:
Low Rank Matrices
Topic Models
Term Document
Matrix
{
{
thousands of
documents
millions of
words
degree of topicality
topic profile
Politics
+ a handful of other features...
Decomposition:
Simple objects
Objects Notion of Simplicity Atoms
Vectors Sparsity Canonical unit vectors
Matrices Rank Rank-1 matrices
Bandlimited signals No. of frequencies Complex sinusoids
Linear systems Number of poles Single pole systems
x =
X
a2A
caax =
X
a2A
caa
atomic set
atoms
“features”
nonnegative weights
Universal Regularizer
Chandrasekaran, Recht, Parillo,Wilsky (2010)
Objects Notion of Simplicity Atomic Norm
Vectors Sparsity
Matrices Rank nuclear norm
Bandlimited signals No. of frequencies ?
Linear systems McMillan degree ?
1
x A = {t 0 : x t (A)}
=
a
ca : x =
a A
caa, ca > 0
t
xx A = {t 0 : x t (A)}
Moment Problems
A
atoms
+ +
⇤ ⇥ ⌅
x
measure space
moments of a discrete measure
atomic moments
System Identification
−1
0
1
−1
0
1
0
2
4
6
−0.5 0 0.5
2
3
4
5
6
7
Line Spectral Estimation
Decomposition
How many atoms do you
need to represent x?
composing atoms are on an
exposed face: decomposition is
easy to find.
find a supporting hyperplane
to certify
concentrate on recovering
“good” objects with atoms on
exposed faces
t
x
Dual Certificate
1 2 3 4 5 6 7
0
1
2
3
4
5
6
X Axis
YAxis
Gradient
Subgradients
optimum value is atomic norm.
solutions are subgradients at x
semi-infinite problem
maximize
q
hq, x?
i
subject to kqk⇤
A  1
Dual Characterization of atomic norm
kqk⇤
A = sup
a2A
hq, ai
(under mild conditions, Bonsall)
hq, ai = 1, a 2 support
hq, ai < 1, otherwise.
x
t
Denoising
minimize
x
1
2
ky xk2
2 + ⌧kxkA
Tradeoff parameter
AST (Atomic Soft Thresholding)
Dual AST
maximize
q
hq, yi ⌧kqk2
2
subject to kqk⇤
A  1
y = x + w
Cn
N(0, 2
In)
AST Results
⌧ ˆq + ˆx = y
kˆqk⇤
A  1
hˆq, ˆxi = kˆxkA
Optimality Conditions
suggests
⌧ ˆq ⇡ w ) ⌧ ⇡ Ekwk⇤
A
means
ˆq 2 @kˆxkA
Can find composing atoms!
Tradeoff = Gaussian Width
No cross validation needed - estimate
Gaussian width
Gaussian width also important to
characterize convergence rates
AST Convergence Rates
Fast Rate
s sparse
1
n
X ˆ X 2
2 C 2 s log(p)
n
Slow RateLasso n
1
n
X ˆ X 2
2
log(p)
n
1
Choose appropriate tradeoff
= E ( w A) , 1
Universal, but slow rate
Minimal assumptions on atoms
1
n
ˆx x 2
2
n
x A
With a “cone condition”, faster rates are possible
Weaker than RIP,
Coherence, RSC, RE 1
n
ˆx x 2
2 C
2
1/2n
Summary
Notion of simple models
Atomic norm unifies treatment of many models
Recovers tight published bounds
Bounds depend on Gaussian width
Faster rates with a local cone condition
Line Spectral Estimation
x0
...
xk
...
xn 1
=
s
j=1
cjei2 fj k
Line Spectrum Time Samples
Extrapolate the remaining moments!
µ =
j
cj fj
Super resolution
Time frequency dual
Extrapolate high frequency information from
low frequency samples
Just exchange time and frequency: same as line
spectral estimation
Classical Line Spectrum Estimation
(a crash course)
Tn(x) =
2
6
6
6
4
x1 x2 . . . xn
x⇤
2 x1 . . . xn 1
...
...
...
...
x⇤
n x⇤
n 1 . . . x1
3
7
7
7
5
Define for any vector x,
Nonlinear Parameter Estimation xk =
s
j=1
cje2 ifj k
Low rank and Toeplitz
FACT:
Tn(x ) =
s
j=1
cj
1
ei2 fj
...
ei2 (n 1)fj
1 e i2 fj
. . . e i2 (n 1)fj
Classical techniques
PRONY estimate s, root finding
CADZOW alternating projections
MUSIC low rank, plot pseudospectrum
Sensitive to model order
No rigorous theory
Only optimal asymptotically
Low rank and Toeplitz
FACT:
Tn(x ) =
s
j=1
cj
1
ei2 fj
...
ei2 (n 1)fj
1 e i2 fj
. . . e i2 (n 1)fj
Convex Perspective
0 2 4 6 8 10 12 14
−10
−5
0
5
10
15
x =
s
j=1
cja(fj)
amplitudes
frequencies
x0
...
xk
...
xn 1
=
s
j=1
cjei2 fj k
1
...
ei2 fj k
...
ei2 fj (n 1)
AtomsLine Spectrum
A+ = {a(f) | f [0, 1]}Positive Amplitudes:
A = a(f)ei
f [0, 1], [0, 2 ]Complex:
Convergence rates
With high probability, for n>4,
⌧ ⇡
p
n log(n)Choose tradeoff as
Dudley’s inequality
Bernstein’s polynomial theorem
For a signal with n samples
of the form,
Using the global AST guarantee, we get,
E
1
n
ˆx x 2
2 C
log(n)
n
s
j=1
|cj|
yk =
s
j=1
cje2 ifj k
+ wk
n log(n) n
2 log(4 log(n)) w A 1 +
1
log(n)
n log(n) + n log(16 3/2 log(n))
Fast Rates
E
✓
1
n
kˆx x?
k2
2
◆
 C 2
s
log(n)
n
min
p6=q
d(up, uq)
4
n
If every two frequencies are far enough apart (minimum separation condition)
For s frequencies and n samples, MSE is
Not a global condition on the dictionary, local to the signal
Proof uses many properties of trigonometric polynomials and Fejer
kernels derived by Candes and Fernandez Granda.
⌧ = ⌘
p
n log(n), ⌘ 2 (1, 1) large enough.
and the tradeoff parameter is chosen correctly,
Can’t do much better
Also nearly matches minimax bound on best prediction rate for
signals with 4/n minimum separation.
E
✓
1
n
kˆx x?
k2
2
◆
C 2 s log(n/4s)
n
Only logarithmic factor from “oracle rate”
E
✓
1
n
kˆx x?
k2
2
◆
 C 2
s
log(n)
n
Our rate:
E
1
n
ˆx x 2
2 C 2 s
n
Minimum Separation and Exact Recovery
Result of Candes and Fernandez Granda
4/n separation = can construct explicit dual
certificate of exact recovery.
Convex hull of far enough vertices are exposed
faces.
Prony does not have this limitation. (Then why
bother? Robustness guarantee)
Minimum separation is necessary
except for the positive case...
Convex hull of every collection of n/2
vertices is an exposed face
Infinite dimensional version of cyclic
polytope, which is neighbourly
No separation condition needed.
Localizing Frequencies
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
Frequency
DualPolynomialValue
0.34 0.36 0.38 0.4 0.42 0.44 0.46
0.85
0.9
0.95
1
Frequency
DualPolynomialValue
Dual Polynomial
y = ˆx + ⌧ ˆq
ˆQ(f) =
n 1
j=0
ˆqje i2 jf
hq, ai = 1, a 2 support
hq, ai < 1, otherwise.
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
0
Frequency
DualPolynomialMagnitude
True FrequencyEstimated Frequency
0.16/n
Spurious Frequency
l: ˆfl F
|ˆcl| C1
k2 (n)
n
cj
l: ˆfl Nj
ˆcl C3
k2 (n)
n
l: ˆfl Nj
|ˆcl|
fj T
d(fj, ˆfl)
2
C2
k2 (n)
n
How to actually solve AST
Semidefinite Characterizations
minimize
x
1
2
ky xk2
2 + ⌧kxkA
conv(A+) conv(A)
kxkA+
=
(
x0, Tn(x) ⌫ 0,
+1, otherwise.
Bounded Real Lemma
Caratheodory Toeplitz
A+ = {a(f) | f [0, 1]}Positive Amplitudes: A = a(f)ei
f [0, 1], [0, 2 ]Complex:
x A = inf
1
2n
tr(Tn(u)) +
1
2
t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimizet,u,x,Z
1
2 kx yk2
2 + ⌧
2 (t + u1)
subject to Z =

T(u) x
x⇤
t
Z ⌫ 0.
AST SDP Formulation
(tl+1
, ul+1
, xl+1
) arg min
t,u,x
L⇢(t, u, x, Zl
, ⇤l
)
Zl+1
arg min
Z⌫0
L⇢(tl+1
, ul+1
, xl+1
, Z, ⇤l
)
⇤l+1
⇤l
+ ⇢
✓
Zl+1

T(ul+1
) xl+1
xl+1⇤
tl+1
◆
.
Alternating Direction Iterations
quadratic objective, linear
projection on psd cone
linear dual update
L⇢(t, u, x, Z, ⇤) =
1
2
kx yk2
2 +
⌧
2
(t + u1) +
⌧
⇤, Z

T(u) x
x⇤
t
+
⇢
2
Z

T(u) x
x⇤
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| {z } | {z }
Alternative Discretization Method
Can use a grid AN = {a(f, ) : f = j/N, j = 1, . . . , N, 2 [0, 2⇡)}
We show (1
2⇡n
N
)kxkAN
 kxkA  kxkAN
AST is approximated by Lasso
{
{
n
frequencies on a N grid
amplitudes
c?
+ w
atomic moments
Via Bernstein’s Polynomial
Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no. of frequencies: n/16, n/8, n/4
SNR: -5 dB, 0 dB, ... , 20 dB
(Root) MUSIC, Matrix pencil, Cadzow: All fed true model
order.
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error
metrics, averaged over 20 trials.
Mean Squared Error
MSE vs SNR
10 5 0 5 10 15 20
30
20
10
0
10
SNR(dB)
MSE(dB)
AST
Cadzow
MUSIC
Lasso
128 samples, 8 frequencies
Mean Squared Error
Performance Profiles
2 4 6 8 10 12 14
0
0.2
0.4
0.6
0.8
1
β
Ps(β)
AST
Lasso 215
MUSIC
Cadzow
Matrix Pencil
P( ) = Fraction of experiments with MSE less than ⇥ minimum MSE.
Average MSE in 20 trials
n=64,128,256
s=n/16,n/8,n/4
SNR=-5, 0, ..., 20 dB
Mean Squared Error
Effect of Gridding
1 2 3 4 5 6 7 8 9 10
0
0.2
0.4
0.6
0.8
1
β
Ps(β)
Lasso 2
10
Lasso 211
Lasso 212
Lasso 2
13
Lasso 2
14
Lasso 2
15
Frequency Localization
20 40 60 80 100
0
0.2
0.4
0.6
0.8
1
β
Ps(β)
AST
MUSIC
Cadzow
100 200 300 400 500
0
0.2
0.4
0.6
0.8
1
β
Ps(β)
AST
MUSIC
Cadzow
5 10 15 20
0
0.2
0.4
0.6
0.8
1
β
Ps(β)
AST
MUSIC
Cadzow
−10 −5 0 5 10 15 20
0
20
40
60
80
100
k = 32
SNR (dB)
m1
AST
MUSIC
Cadzow
−10 −5 0 5 10 15 20
0
0.01
0.02
0.03
0.04
k = 32
SNR (dB)
m2
AST
MUSIC
Cadzow
−10 −5 0 5 10 15 20
0
1
2
3
4
5
k = 32
SNR (dB)
m3
AST
MUSIC
Cadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI Systems
Vibration u(t)
u(t) y(t)
?
x[t + 1] = Ax[t] + Bu[t]
y[t] = Cx[t] + Du[t]
State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank
formulations
Subspace Methods
2
6
6
6
6
6
4
y1 y2 · · · yk
y2 y3 · · · yk+1
y3 y4 · · · yk+2
...
...
...
y` y`+1 · · · y`+k 1
3
7
7
7
7
7
5
=
2
6
6
6
6
6
4
C
CA
CA2
...
CA` 1
3
7
7
7
7
7
5
⇥
x1 x2 . . . xk
⇤
+
2
6
6
6
6
6
4
D 0 . . . 0
CB D 0 . . . 0
CAB CB D . . . 0
...
...
...
CA` 2
B . . . D
3
7
7
7
7
7
5
2
6
6
6
6
6
4
u1 u2 · · · uk
u2 u3 · · · uk+1
u3 u4 · · · uk+2
...
...
...
u` u`+1 · · · u`+k 1
3
7
7
7
7
7
5
With a little computation,
| {z }
| {z }
| {z }
Y
X
U
Y = OX + TU Y U?
= OXU?
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
⇥ 2 1. . .
u[t]
G
0 1 2 . . .
Past inputs Hankel Operator Future Outputs
G : 2( ⇤, 0) ⇥ 2[0, ⇤)
Fact: Rank of Hankel Operator = McMillan Degree
Relax and use Hankel nuclear norm?
Not clear how to compute...
Convex Perspective
SISO Systems
G(z) =
2(1 0.72
)
z 0.7
+
5(1 0.52
)
z (0.3 + 0.4i)
+
5(1 0.52
)
z (0.3 0.4i)
−1
0
1
−1
0
1
0
2
4
6
A =
⇢
1 |w|2
z w
w 2 D⇢
D
y = L (G(z)) + w
−0.5 0 0.5
2
3
4
5
6
7
minimize
G
ky L(G)k2
2 + µkGkA
Regularize with atomic norms:
⇡
8
kGkA  k Gk1  kGkA
Theorem:
Atomic Norm Regularization
How to solve this?
Equivalent to:
Discretize and Set
k` := Lk
✓
1 |w|2
z w`
◆
minimize
c
1
2 ky ck2
2 + µkck1 Lasso
minimize
G
ky L(G)k2
2 + µkGkA
minimizec,x ky xk2
2 + µ
P
w2D⇢
|cw|
1 |w|2
subject to x =
P
w2D⇢
cwL
⇣
1
z w
⌘
DAST Analysis
constant Hankel nuclear
norm
stability
radius
number of
measurements
Measure uniform spaced samples of
frequency response:
kG ˆGk2
H2

0k Gk1
(1 ⇢)
p
n
ˆG(z) =
X
`
ˆc`
1 |w`|2
z w`
Set:
Then we have,
k` =
1 |w`|2
e2⇡ik/n w`
Solve: ˆx = arg min
x
1
2 ky xk2
2 + µkxkA✏
over a net on
the disk
where the coefficients
correspond to the support of ˆx
6 poles
ρ = 0.95
σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex
methods.
Better empirical performance than classical
approaches
Local conditions on signals instead of global
dictionary conditions
Discretization works!
Acknowledgements: Results derived in
collaboration with Tang, Shah and Prof. Recht.
Future Work
• Better convergence results for discretization.
• Why does weighted mean heuristic work so well for
discretization?
• Better algorithms for System Identification
• Greedy techniques
• More atomic sets. General localization guarantees.
Referenced Publications
B, Recht. “Atomic Norm Denoising for Line Spectral Estimation” in 49th
Allerton Conference on Controls and Systems, 2011
B, Tang, Recht. “Atomic Norm Denoising for Line Spectral Estimation”,
submitted to IEEE Transactions on Signal Processing
Tang, B, Recht. “Near Minimax Line Spectral Estimation”, To be submitted
Shah, B, Tang, Recht. “Linear System Identification using Atomic Norm
Minimization”, Control and Decision Conference, 2012
Other Publications
Tang, B, Shah, Recht. “Compressed Sensing off the grid”, submitted to
IEEE Transactions on Information Theory
Dasarathy, Shah, B, Nowak. “Covariance Sketching”, 50thAllerton
Conference on Controls and Systems, 2012
Gubner, B, Hao “Multipath-Cluster Channel Models”, IEEE Conference on
Ultrawideband Systems 2012
Wittenberg, Alimadhi, B, Lau “ei. RxC: Hierarchical Multinomial-
Dirichlet Ecological Inference Model” in Zelig: Everyone’s Statistical Software.
Tang, B, Recht. “Sparse recovery over continuous dictionaries: Just
discretize” Submitted toAsilomar Conference 2013
Cone Condition
=
z=0
z 2
z A
z C (x )
C (x ) = {z | x + z A x A + z A}
Inflated Descent Cone
No direction in Cᵧ with zero atomic norm
C0
(0 1)
Minimum separation is necessary
Using two sinusoids:
ka(f1) a(f2)kA  n1/2
ka(f1) a(f2)k2
 2⇡n2
|f1 f2|
So,
When separation is lower than 1/4n2 atomic norm can’t recover it.
ka(f1) a(f2)k2 = 2⇡n3/2
|f1 f2|
Can empirically show failure with less than 1/n separation and adversarial signal

More Related Content

What's hot

Speech signal time frequency representation
Speech signal time frequency representationSpeech signal time frequency representation
Speech signal time frequency representationNikolay Karpov
 
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
Leo Asselborn
 
2014.10.dartmouth
2014.10.dartmouth2014.10.dartmouth
2014.10.dartmouth
Qiqi Wang
 
Dynamic Compression Transmission for Energy Harvesting Networks
Dynamic Compression Transmission for Energy Harvesting NetworksDynamic Compression Transmission for Energy Harvesting Networks
Dynamic Compression Transmission for Energy Harvesting Networks
Cristiano Tapparello
 
Fulltext
FulltextFulltext
Fulltext
PrabirDas55
 
Presentation M2 internship rare-earth nickelates
Presentation M2 internship rare-earth nickelatesPresentation M2 internship rare-earth nickelates
Presentation M2 internship rare-earth nickelatesYiteng Dang
 
Vibration energy harvesting under uncertainty
Vibration energy harvesting under uncertaintyVibration energy harvesting under uncertainty
Vibration energy harvesting under uncertaintyUniversity of Glasgow
 
Nonparametric Density Estimation
Nonparametric Density EstimationNonparametric Density Estimation
Nonparametric Density Estimationjachno
 
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Taiji Suzuki
 
Time-Frequency Representation of Microseismic Signals using the SST
Time-Frequency Representation of Microseismic Signals using the SSTTime-Frequency Representation of Microseismic Signals using the SST
Time-Frequency Representation of Microseismic Signals using the SST
UT Technology
 
Pdpta11 G-Ensemble for Meteorological Prediction Enhancement
Pdpta11 G-Ensemble for Meteorological Prediction EnhancementPdpta11 G-Ensemble for Meteorological Prediction Enhancement
Pdpta11 G-Ensemble for Meteorological Prediction Enhancement
Hisham Ihshaish
 
Introduction to Quantum Monte Carlo
Introduction to Quantum Monte CarloIntroduction to Quantum Monte Carlo
Introduction to Quantum Monte Carlo
Claudio Attaccalite
 
DSP_FOEHU - MATLAB 01 - Discrete Time Signals and Systems
DSP_FOEHU - MATLAB 01 - Discrete Time Signals and SystemsDSP_FOEHU - MATLAB 01 - Discrete Time Signals and Systems
DSP_FOEHU - MATLAB 01 - Discrete Time Signals and Systems
Amr E. Mohamed
 
Ilhabela
IlhabelaIlhabela
Ilhabela
Biagio Lucini
 
SOCG: Linear-Size Approximations to the Vietoris-Rips Filtration
SOCG: Linear-Size Approximations to the Vietoris-Rips FiltrationSOCG: Linear-Size Approximations to the Vietoris-Rips Filtration
SOCG: Linear-Size Approximations to the Vietoris-Rips Filtration
Don Sheehy
 
Chester Nov08 Terry Lynch
Chester Nov08 Terry LynchChester Nov08 Terry Lynch
Chester Nov08 Terry LynchTerry Lynch
 
Transformations of continuous-variable entangled states of light
Transformations of continuous-variable entangled states of lightTransformations of continuous-variable entangled states of light
Transformations of continuous-variable entangled states of light
Ondrej Cernotik
 
Wave diffraction
Wave diffractionWave diffraction
Wave diffraction
eli priyatna laidan
 

What's hot (20)

Speech signal time frequency representation
Speech signal time frequency representationSpeech signal time frequency representation
Speech signal time frequency representation
 
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...
 
8 lti psd
8 lti psd8 lti psd
8 lti psd
 
2014.10.dartmouth
2014.10.dartmouth2014.10.dartmouth
2014.10.dartmouth
 
Dynamic Compression Transmission for Energy Harvesting Networks
Dynamic Compression Transmission for Energy Harvesting NetworksDynamic Compression Transmission for Energy Harvesting Networks
Dynamic Compression Transmission for Energy Harvesting Networks
 
Fulltext
FulltextFulltext
Fulltext
 
Presentation M2 internship rare-earth nickelates
Presentation M2 internship rare-earth nickelatesPresentation M2 internship rare-earth nickelates
Presentation M2 internship rare-earth nickelates
 
Vibration energy harvesting under uncertainty
Vibration energy harvesting under uncertaintyVibration energy harvesting under uncertainty
Vibration energy harvesting under uncertainty
 
Nonparametric Density Estimation
Nonparametric Density EstimationNonparametric Density Estimation
Nonparametric Density Estimation
 
20130722
2013072220130722
20130722
 
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
 
Time-Frequency Representation of Microseismic Signals using the SST
Time-Frequency Representation of Microseismic Signals using the SSTTime-Frequency Representation of Microseismic Signals using the SST
Time-Frequency Representation of Microseismic Signals using the SST
 
Pdpta11 G-Ensemble for Meteorological Prediction Enhancement
Pdpta11 G-Ensemble for Meteorological Prediction EnhancementPdpta11 G-Ensemble for Meteorological Prediction Enhancement
Pdpta11 G-Ensemble for Meteorological Prediction Enhancement
 
Introduction to Quantum Monte Carlo
Introduction to Quantum Monte CarloIntroduction to Quantum Monte Carlo
Introduction to Quantum Monte Carlo
 
DSP_FOEHU - MATLAB 01 - Discrete Time Signals and Systems
DSP_FOEHU - MATLAB 01 - Discrete Time Signals and SystemsDSP_FOEHU - MATLAB 01 - Discrete Time Signals and Systems
DSP_FOEHU - MATLAB 01 - Discrete Time Signals and Systems
 
Ilhabela
IlhabelaIlhabela
Ilhabela
 
SOCG: Linear-Size Approximations to the Vietoris-Rips Filtration
SOCG: Linear-Size Approximations to the Vietoris-Rips FiltrationSOCG: Linear-Size Approximations to the Vietoris-Rips Filtration
SOCG: Linear-Size Approximations to the Vietoris-Rips Filtration
 
Chester Nov08 Terry Lynch
Chester Nov08 Terry LynchChester Nov08 Terry Lynch
Chester Nov08 Terry Lynch
 
Transformations of continuous-variable entangled states of light
Transformations of continuous-variable entangled states of lightTransformations of continuous-variable entangled states of light
Transformations of continuous-variable entangled states of light
 
Wave diffraction
Wave diffractionWave diffraction
Wave diffraction
 

Similar to Decomposition and Denoising for moment sequences using convex optimization

MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
Elvis DOHMATOB
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing
ssuser2797e4
 
Lecture 3 sapienza 2017
Lecture 3 sapienza 2017Lecture 3 sapienza 2017
Lecture 3 sapienza 2017
Franco Bontempi Org Didattica
 
OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...
OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...
OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...
Pioneer Natural Resources
 
Spectral-, source-, connectivity- and network analysis of EEG and MEG data
Spectral-, source-, connectivity- and network analysis of EEG and MEG dataSpectral-, source-, connectivity- and network analysis of EEG and MEG data
Spectral-, source-, connectivity- and network analysis of EEG and MEG data
Robert Oostenveld
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
Christian Robert
 
KAUST_talk_short.pdf
KAUST_talk_short.pdfKAUST_talk_short.pdf
KAUST_talk_short.pdf
Chiheb Ben Hammouda
 
CS-ChapterI.pdf
CS-ChapterI.pdfCS-ChapterI.pdf
CS-ChapterI.pdf
ssuser5e58621
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
Fabian Pedregosa
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化
Akira Tanimoto
 
kactl.pdf
kactl.pdfkactl.pdf
kactl.pdf
Rayhan331
 
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Yandex
 
Input analysis
Input analysisInput analysis
Input analysis
Bhavik A Shah
 
Bayesian modelling and computation for Raman spectroscopy
Bayesian modelling and computation for Raman spectroscopyBayesian modelling and computation for Raman spectroscopy
Bayesian modelling and computation for Raman spectroscopy
Matt Moores
 
129966863283913778[1]
129966863283913778[1]129966863283913778[1]
129966863283913778[1]威華 王
 
Dereverberation in the stft and log mel frequency feature domains
Dereverberation in the stft and log mel frequency feature domainsDereverberation in the stft and log mel frequency feature domains
Dereverberation in the stft and log mel frequency feature domains
Takuya Yoshioka
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
Christian Robert
 
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language ModelsAnomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
Cynthia Freeman
 
Estimating structured vector autoregressive models
Estimating structured vector autoregressive modelsEstimating structured vector autoregressive models
Estimating structured vector autoregressive models
Akira Tanimoto
 

Similar to Decomposition and Denoising for moment sequences using convex optimization (20)

MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing
 
Lecture 3 sapienza 2017
Lecture 3 sapienza 2017Lecture 3 sapienza 2017
Lecture 3 sapienza 2017
 
OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...
OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...
OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...
 
Spectral-, source-, connectivity- and network analysis of EEG and MEG data
Spectral-, source-, connectivity- and network analysis of EEG and MEG dataSpectral-, source-, connectivity- and network analysis of EEG and MEG data
Spectral-, source-, connectivity- and network analysis of EEG and MEG data
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
 
KAUST_talk_short.pdf
KAUST_talk_short.pdfKAUST_talk_short.pdf
KAUST_talk_short.pdf
 
CS-ChapterI.pdf
CS-ChapterI.pdfCS-ChapterI.pdf
CS-ChapterI.pdf
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化
 
kactl.pdf
kactl.pdfkactl.pdf
kactl.pdf
 
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
 
Input analysis
Input analysisInput analysis
Input analysis
 
Bayesian modelling and computation for Raman spectroscopy
Bayesian modelling and computation for Raman spectroscopyBayesian modelling and computation for Raman spectroscopy
Bayesian modelling and computation for Raman spectroscopy
 
129966863283913778[1]
129966863283913778[1]129966863283913778[1]
129966863283913778[1]
 
Lecture1
Lecture1Lecture1
Lecture1
 
Dereverberation in the stft and log mel frequency feature domains
Dereverberation in the stft and log mel frequency feature domainsDereverberation in the stft and log mel frequency feature domains
Dereverberation in the stft and log mel frequency feature domains
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
 
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language ModelsAnomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
 
Estimating structured vector autoregressive models
Estimating structured vector autoregressive modelsEstimating structured vector autoregressive models
Estimating structured vector autoregressive models
 

Recently uploaded

Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Studia Poinsotiana
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 

Recently uploaded (20)

Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 

Decomposition and Denoising for moment sequences using convex optimization

  • 1. Decomposition and Denoising for moment sequences using convex optimization Badri Narayan Bhaskar
  • 2. High Dimensional Inference n p Statistics Microarray Data Hyperspectral Imaging Predict Ratings
  • 3. Sparse Vector Estimation Gene Expressions { {X n p k sparse ? Samples Genes Few Active Genes Disease Susceptibility = Response
  • 4. Low Rank Matrices The Netflix Problem Movie Ratings { { thousands of movies millions of subscribers movie profile user profile Action + a handful of other features... Decomposition:
  • 5. Low Rank Matrices Topic Models Term Document Matrix { { thousands of documents millions of words degree of topicality topic profile Politics + a handful of other features... Decomposition:
  • 6. Simple objects Objects Notion of Simplicity Atoms Vectors Sparsity Canonical unit vectors Matrices Rank Rank-1 matrices Bandlimited signals No. of frequencies Complex sinusoids Linear systems Number of poles Single pole systems x = X a2A caax = X a2A caa atomic set atoms “features” nonnegative weights
  • 7. Universal Regularizer Chandrasekaran, Recht, Parillo,Wilsky (2010) Objects Notion of Simplicity Atomic Norm Vectors Sparsity Matrices Rank nuclear norm Bandlimited signals No. of frequencies ? Linear systems McMillan degree ? 1 x A = {t 0 : x t (A)} = a ca : x = a A caa, ca > 0 t xx A = {t 0 : x t (A)}
  • 8. Moment Problems A atoms + + ⇤ ⇥ ⌅ x measure space moments of a discrete measure atomic moments System Identification −1 0 1 −1 0 1 0 2 4 6 −0.5 0 0.5 2 3 4 5 6 7 Line Spectral Estimation
  • 9. Decomposition How many atoms do you need to represent x? composing atoms are on an exposed face: decomposition is easy to find. find a supporting hyperplane to certify concentrate on recovering “good” objects with atoms on exposed faces t x
  • 10. Dual Certificate 1 2 3 4 5 6 7 0 1 2 3 4 5 6 X Axis YAxis Gradient Subgradients optimum value is atomic norm. solutions are subgradients at x semi-infinite problem maximize q hq, x? i subject to kqk⇤ A  1 Dual Characterization of atomic norm kqk⇤ A = sup a2A hq, ai (under mild conditions, Bonsall) hq, ai = 1, a 2 support hq, ai < 1, otherwise. x t
  • 11. Denoising minimize x 1 2 ky xk2 2 + ⌧kxkA Tradeoff parameter AST (Atomic Soft Thresholding) Dual AST maximize q hq, yi ⌧kqk2 2 subject to kqk⇤ A  1 y = x + w Cn N(0, 2 In)
  • 12. AST Results ⌧ ˆq + ˆx = y kˆqk⇤ A  1 hˆq, ˆxi = kˆxkA Optimality Conditions suggests ⌧ ˆq ⇡ w ) ⌧ ⇡ Ekwk⇤ A means ˆq 2 @kˆxkA Can find composing atoms! Tradeoff = Gaussian Width No cross validation needed - estimate Gaussian width Gaussian width also important to characterize convergence rates
  • 13. AST Convergence Rates Fast Rate s sparse 1 n X ˆ X 2 2 C 2 s log(p) n Slow RateLasso n 1 n X ˆ X 2 2 log(p) n 1 Choose appropriate tradeoff = E ( w A) , 1 Universal, but slow rate Minimal assumptions on atoms 1 n ˆx x 2 2 n x A With a “cone condition”, faster rates are possible Weaker than RIP, Coherence, RSC, RE 1 n ˆx x 2 2 C 2 1/2n
  • 14. Summary Notion of simple models Atomic norm unifies treatment of many models Recovers tight published bounds Bounds depend on Gaussian width Faster rates with a local cone condition
  • 15. Line Spectral Estimation x0 ... xk ... xn 1 = s j=1 cjei2 fj k Line Spectrum Time Samples Extrapolate the remaining moments! µ = j cj fj
  • 16. Super resolution Time frequency dual Extrapolate high frequency information from low frequency samples Just exchange time and frequency: same as line spectral estimation
  • 17. Classical Line Spectrum Estimation (a crash course) Tn(x) = 2 6 6 6 4 x1 x2 . . . xn x⇤ 2 x1 . . . xn 1 ... ... ... ... x⇤ n x⇤ n 1 . . . x1 3 7 7 7 5 Define for any vector x, Nonlinear Parameter Estimation xk = s j=1 cje2 ifj k Low rank and Toeplitz FACT: Tn(x ) = s j=1 cj 1 ei2 fj ... ei2 (n 1)fj 1 e i2 fj . . . e i2 (n 1)fj
  • 18. Classical techniques PRONY estimate s, root finding CADZOW alternating projections MUSIC low rank, plot pseudospectrum Sensitive to model order No rigorous theory Only optimal asymptotically Low rank and Toeplitz FACT: Tn(x ) = s j=1 cj 1 ei2 fj ... ei2 (n 1)fj 1 e i2 fj . . . e i2 (n 1)fj
  • 19. Convex Perspective 0 2 4 6 8 10 12 14 −10 −5 0 5 10 15 x = s j=1 cja(fj) amplitudes frequencies x0 ... xk ... xn 1 = s j=1 cjei2 fj k 1 ... ei2 fj k ... ei2 fj (n 1) AtomsLine Spectrum A+ = {a(f) | f [0, 1]}Positive Amplitudes: A = a(f)ei f [0, 1], [0, 2 ]Complex:
  • 20. Convergence rates With high probability, for n>4, ⌧ ⇡ p n log(n)Choose tradeoff as Dudley’s inequality Bernstein’s polynomial theorem For a signal with n samples of the form, Using the global AST guarantee, we get, E 1 n ˆx x 2 2 C log(n) n s j=1 |cj| yk = s j=1 cje2 ifj k + wk n log(n) n 2 log(4 log(n)) w A 1 + 1 log(n) n log(n) + n log(16 3/2 log(n))
  • 21. Fast Rates E ✓ 1 n kˆx x? k2 2 ◆  C 2 s log(n) n min p6=q d(up, uq) 4 n If every two frequencies are far enough apart (minimum separation condition) For s frequencies and n samples, MSE is Not a global condition on the dictionary, local to the signal Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda. ⌧ = ⌘ p n log(n), ⌘ 2 (1, 1) large enough. and the tradeoff parameter is chosen correctly,
  • 22. Can’t do much better Also nearly matches minimax bound on best prediction rate for signals with 4/n minimum separation. E ✓ 1 n kˆx x? k2 2 ◆ C 2 s log(n/4s) n Only logarithmic factor from “oracle rate” E ✓ 1 n kˆx x? k2 2 ◆  C 2 s log(n) n Our rate: E 1 n ˆx x 2 2 C 2 s n
  • 23. Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda 4/n separation = can construct explicit dual certificate of exact recovery. Convex hull of far enough vertices are exposed faces. Prony does not have this limitation. (Then why bother? Robustness guarantee) Minimum separation is necessary
  • 24. except for the positive case... Convex hull of every collection of n/2 vertices is an exposed face Infinite dimensional version of cyclic polytope, which is neighbourly No separation condition needed.
  • 25. Localizing Frequencies 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Frequency DualPolynomialValue 0.34 0.36 0.38 0.4 0.42 0.44 0.46 0.85 0.9 0.95 1 Frequency DualPolynomialValue Dual Polynomial y = ˆx + ⌧ ˆq ˆQ(f) = n 1 j=0 ˆqje i2 jf hq, ai = 1, a 2 support hq, ai < 1, otherwise. x t
  • 26. Localization Guarantees Spurious Amplitudes Weighted Frequency Deviation Near region approximation 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 Frequency DualPolynomialMagnitude True FrequencyEstimated Frequency 0.16/n Spurious Frequency l: ˆfl F |ˆcl| C1 k2 (n) n cj l: ˆfl Nj ˆcl C3 k2 (n) n l: ˆfl Nj |ˆcl| fj T d(fj, ˆfl) 2 C2 k2 (n) n
  • 27. How to actually solve AST Semidefinite Characterizations minimize x 1 2 ky xk2 2 + ⌧kxkA conv(A+) conv(A) kxkA+ = ( x0, Tn(x) ⌫ 0, +1, otherwise. Bounded Real Lemma Caratheodory Toeplitz A+ = {a(f) | f [0, 1]}Positive Amplitudes: A = a(f)ei f [0, 1], [0, 2 ]Complex: x A = inf 1 2n tr(Tn(u)) + 1 2 t Tn(u) x x t 0
  • 28. Alternating Direction Method of Multipliers minimizet,u,x,Z 1 2 kx yk2 2 + ⌧ 2 (t + u1) subject to Z =  T(u) x x⇤ t Z ⌫ 0. AST SDP Formulation (tl+1 , ul+1 , xl+1 ) arg min t,u,x L⇢(t, u, x, Zl , ⇤l ) Zl+1 arg min Z⌫0 L⇢(tl+1 , ul+1 , xl+1 , Z, ⇤l ) ⇤l+1 ⇤l + ⇢ ✓ Zl+1  T(ul+1 ) xl+1 xl+1⇤ tl+1 ◆ . Alternating Direction Iterations quadratic objective, linear projection on psd cone linear dual update L⇢(t, u, x, Z, ⇤) = 1 2 kx yk2 2 + ⌧ 2 (t + u1) + ⌧ ⇤, Z  T(u) x x⇤ t + ⇢ 2 Z  T(u) x x⇤ t 2 F Form Augmented Lagrangian Lagrangian Term Augmented Lagrangian Momentum Parameter | {z } | {z }
  • 29. Alternative Discretization Method Can use a grid AN = {a(f, ) : f = j/N, j = 1, . . . , N, 2 [0, 2⇡)} We show (1 2⇡n N )kxkAN  kxkA  kxkAN AST is approximated by Lasso { { n frequencies on a N grid amplitudes c? + w atomic moments Via Bernstein’s Polynomial Theorem or Fritz John Works for general epsilon nets Fast shrinkage algorithms Fourier projections are cheap Coherence is irrelevant
  • 30. Experimental setup n=64 or 128 or 256 samples no. of frequencies: n/16, n/8, n/4 SNR: -5 dB, 0 dB, ... , 20 dB (Root) MUSIC, Matrix pencil, Cadzow: All fed true model order. Empirical estimated (not fed) noise variance for AST Compared mean-squared-error and localization error metrics, averaged over 20 trials.
  • 31. Mean Squared Error MSE vs SNR 10 5 0 5 10 15 20 30 20 10 0 10 SNR(dB) MSE(dB) AST Cadzow MUSIC Lasso 128 samples, 8 frequencies
  • 32. Mean Squared Error Performance Profiles 2 4 6 8 10 12 14 0 0.2 0.4 0.6 0.8 1 β Ps(β) AST Lasso 215 MUSIC Cadzow Matrix Pencil P( ) = Fraction of experiments with MSE less than ⇥ minimum MSE. Average MSE in 20 trials n=64,128,256 s=n/16,n/8,n/4 SNR=-5, 0, ..., 20 dB
  • 33. Mean Squared Error Effect of Gridding 1 2 3 4 5 6 7 8 9 10 0 0.2 0.4 0.6 0.8 1 β Ps(β) Lasso 2 10 Lasso 211 Lasso 212 Lasso 2 13 Lasso 2 14 Lasso 2 15
  • 34. Frequency Localization 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 β Ps(β) AST MUSIC Cadzow 100 200 300 400 500 0 0.2 0.4 0.6 0.8 1 β Ps(β) AST MUSIC Cadzow 5 10 15 20 0 0.2 0.4 0.6 0.8 1 β Ps(β) AST MUSIC Cadzow −10 −5 0 5 10 15 20 0 20 40 60 80 100 k = 32 SNR (dB) m1 AST MUSIC Cadzow −10 −5 0 5 10 15 20 0 0.01 0.02 0.03 0.04 k = 32 SNR (dB) m2 AST MUSIC Cadzow −10 −5 0 5 10 15 20 0 1 2 3 4 5 k = 32 SNR (dB) m3 AST MUSIC Cadzow Spurious Amplitudes Weighted Frequency Deviation Near region approximation
  • 35. Simple LTI Systems Vibration u(t) u(t) y(t) ? x[t + 1] = Ax[t] + Bu[t] y[t] = Cx[t] + Du[t] State Space Find a simple linear system from time series data Nonlinear Kalman filter based estimation Parametric methods based on Hankel Low rank formulations
  • 36. Subspace Methods 2 6 6 6 6 6 4 y1 y2 · · · yk y2 y3 · · · yk+1 y3 y4 · · · yk+2 ... ... ... y` y`+1 · · · y`+k 1 3 7 7 7 7 7 5 = 2 6 6 6 6 6 4 C CA CA2 ... CA` 1 3 7 7 7 7 7 5 ⇥ x1 x2 . . . xk ⇤ + 2 6 6 6 6 6 4 D 0 . . . 0 CB D 0 . . . 0 CAB CB D . . . 0 ... ... ... CA` 2 B . . . D 3 7 7 7 7 7 5 2 6 6 6 6 6 4 u1 u2 · · · uk u2 u3 · · · uk+1 u3 u4 · · · uk+2 ... ... ... u` u`+1 · · · u`+k 1 3 7 7 7 7 7 5 With a little computation, | {z } | {z } | {z } Y X U Y = OX + TU Y U? = OXU? Output State Input System Matrix T Observability SVD Subspace ID (n4sid) Nuclear Norm (Vandenberghe et al) Low Rank
  • 37. SISO ⇥ 2 1. . . u[t] G 0 1 2 . . . Past inputs Hankel Operator Future Outputs G : 2( ⇤, 0) ⇥ 2[0, ⇤) Fact: Rank of Hankel Operator = McMillan Degree Relax and use Hankel nuclear norm? Not clear how to compute...
  • 38. Convex Perspective SISO Systems G(z) = 2(1 0.72 ) z 0.7 + 5(1 0.52 ) z (0.3 + 0.4i) + 5(1 0.52 ) z (0.3 0.4i) −1 0 1 −1 0 1 0 2 4 6 A = ⇢ 1 |w|2 z w w 2 D⇢ D y = L (G(z)) + w −0.5 0 0.5 2 3 4 5 6 7 minimize G ky L(G)k2 2 + µkGkA Regularize with atomic norms: ⇡ 8 kGkA  k Gk1  kGkA Theorem:
  • 39. Atomic Norm Regularization How to solve this? Equivalent to: Discretize and Set k` := Lk ✓ 1 |w|2 z w` ◆ minimize c 1 2 ky ck2 2 + µkck1 Lasso minimize G ky L(G)k2 2 + µkGkA minimizec,x ky xk2 2 + µ P w2D⇢ |cw| 1 |w|2 subject to x = P w2D⇢ cwL ⇣ 1 z w ⌘
  • 40. DAST Analysis constant Hankel nuclear norm stability radius number of measurements Measure uniform spaced samples of frequency response: kG ˆGk2 H2  0k Gk1 (1 ⇢) p n ˆG(z) = X ` ˆc` 1 |w`|2 z w` Set: Then we have, k` = 1 |w`|2 e2⇡ik/n w` Solve: ˆx = arg min x 1 2 ky xk2 2 + µkxkA✏ over a net on the disk where the coefficients correspond to the support of ˆx
  • 41. 6 poles ρ = 0.95 σ = 1e-1 n=30 H2 err = 2e-2
  • 42. Summary Optimal bounds for denoising using convex methods. Better empirical performance than classical approaches Local conditions on signals instead of global dictionary conditions Discretization works! Acknowledgements: Results derived in collaboration with Tang, Shah and Prof. Recht.
  • 43.
  • 44. Future Work • Better convergence results for discretization. • Why does weighted mean heuristic work so well for discretization? • Better algorithms for System Identification • Greedy techniques • More atomic sets. General localization guarantees.
  • 45. Referenced Publications B, Recht. “Atomic Norm Denoising for Line Spectral Estimation” in 49th Allerton Conference on Controls and Systems, 2011 B, Tang, Recht. “Atomic Norm Denoising for Line Spectral Estimation”, submitted to IEEE Transactions on Signal Processing Tang, B, Recht. “Near Minimax Line Spectral Estimation”, To be submitted Shah, B, Tang, Recht. “Linear System Identification using Atomic Norm Minimization”, Control and Decision Conference, 2012
  • 46. Other Publications Tang, B, Shah, Recht. “Compressed Sensing off the grid”, submitted to IEEE Transactions on Information Theory Dasarathy, Shah, B, Nowak. “Covariance Sketching”, 50thAllerton Conference on Controls and Systems, 2012 Gubner, B, Hao “Multipath-Cluster Channel Models”, IEEE Conference on Ultrawideband Systems 2012 Wittenberg, Alimadhi, B, Lau “ei. RxC: Hierarchical Multinomial- Dirichlet Ecological Inference Model” in Zelig: Everyone’s Statistical Software. Tang, B, Recht. “Sparse recovery over continuous dictionaries: Just discretize” Submitted toAsilomar Conference 2013
  • 47. Cone Condition = z=0 z 2 z A z C (x ) C (x ) = {z | x + z A x A + z A} Inflated Descent Cone No direction in Cᵧ with zero atomic norm C0 (0 1)
  • 48. Minimum separation is necessary Using two sinusoids: ka(f1) a(f2)kA  n1/2 ka(f1) a(f2)k2  2⇡n2 |f1 f2| So, When separation is lower than 1/4n2 atomic norm can’t recover it. ka(f1) a(f2)k2 = 2⇡n3/2 |f1 f2| Can empirically show failure with less than 1/n separation and adversarial signal