Like this presentation? Why not share!

# Tro07 sparse-solutions-talk

## on May 19, 2011

• 722 views

A tutorial for sparse presentation

A tutorial for sparse presentation

### Views

Total Views
722
Views on SlideShare
722
Embed Views
0

Likes
1
40
0

No embeds

### Report content

• Comment goes here.
Are you sure you want to

## Tro07 sparse-solutions-talkPresentation Transcript

• Sparse Representations k Joel A. Tropp Department of Mathematics The University of Michigan jtropp@umich.eduResearch supported in part by NSF and DARPA 1
• IntroductionSparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 2
• Systems of Linear EquationsWe consider linear systems of the form          d  Φ         x = b           NAssume thatl Φ has dimensions d × N with N ≥ dl Φ has full rankl The columns of Φ have unit 2 normSparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 3
• The Trichotomy TheoremTheorem 1. For a linear system Φx = b, exactly one of the followingsituations obtains.1. No solution exists.2. The equation has a unique solution.3. The solutions form a linear subspace of positive dimension.Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 4
• Minimum-Energy SolutionsClassical approach to underdetermined systems: min x 2 subject to Φx = bAdvantages:l Analytically tractablel Physical interpretation as minimum energyl Principled way to pick a unique solutionDisadvantages:l Solution is typically nonzero in every componentl The wrong principle for most applicationsSparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 5
• Regularization via SparsityAnother approach to underdetermined systems: min x 0 subject to Φx = b (P0)where x 0 = #{j : xj = 0}Advantages:l Principled way to choose a solutionl A good principle for many applicationsDisadvantages:l In general, computationally intractableSparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 6
• Sparse Approximationl In practice, we solve a noise-aware variant, such as min x 0 subject to Φx − b 2 ≤εl This is called a sparse approximation probleml The noiseless problem (P0) corresponds to ε = 0l The ε = 0 case is called the sparse representation problemSparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 7
• ApplicationsSparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 8
• Variable Selection in Regressionl The oldest application of sparse approximation is linear regressionl The columns of Φ are explanatory variablesl The right-hand side b is the response variablel Φx is a linear predictor of the responsel Want to use few explanatory variables l Reduces variance of estimator l Limits sensitivity to noiseReference: [Miller 2002]Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 9
• Seismic Imaging "In deconvolving any observed seismic trace, it is rather disappointing to discover that there is a nonzero spike at every point in time regardless of the data sampling rate. One might hope to find spikes only where real geologic discontinuities take place."References: [Claerbout–Muir 1973]Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 10
• Transform Codingl Transform coding can be viewed as a sparse approximation problem DCT −− −→ IDCT ←−− −−Reference: [Daubechies–DeVore–Donoho–Vetterli 1998]Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 11
• AlgorithmsSparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 12
• Sparse Representation is HardTheorem 2. [Davis (1994), Natarajan (1995)] Any algorithm thatcan solve the sparse representation problem for every matrix andright-hand side must solve an NP-hard problem.Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 13
• But... Many interesting instances of the sparse representation problem are tractable! Basic example: Φ is orthogonalSparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 14
• Algorithms for Sparse Representationl Greedy methods make a sequence of locally optimal choices in hope of determining a globally optimal solutionl Convex relaxation methods replace the combinatorial sparse approximation problem with a related convex program in hope that the solutions coincidel Other approaches include brute force, nonlinear programming, Bayesian methods, dynamic programming, algebraic techniques...Refs: [Baraniuk, Barron, Bresler, Cand`s, DeVore, Donoho, Efron, Fuchs, eGilbert, Golub, Hastie, Huo, Indyk, Jones, Mallat, Muthukrishnan, Rao,Romberg, Stewart, Strauss, Tao, Temlyakov, Tewﬁk, Tibshirani, Willsky...]Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 15
• Orthogonal Matching Pursuit (OMP)Input: The matrix Φ, right-hand side b, and sparsity level mInitialize the residual r0 = bFor t = 1, . . . , m doA. Find a column most correlated with the residual: ωt = arg maxj=1,...,N | rt−1, ϕj |B. Update residual by solving a least-squares problem: yt = arg miny b − Φt y 2 rt = b − Φt ytwhere Φt = [ϕω1 . . . ϕωt ]Output: Estimate x(ωj ) = ym(j)Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 16
• 1 MinimizationSparse Representation as a Combinatorial Problem min x 0 subject to Φx = b (P0)Relax to a Convex Program min x 1 subject to Φx = b (P1)l Any numerical method can be used to perform the minimizationl Projected gradient and interior-point methods seem to work bestReferences: [Donoho et al. 1999, Figueredo et al. 2007]Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 17
• Why an 1 objective? 0 quasi-norm 1 norm 2 normSparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 18
• Why an 1 objective? 0 quasi-norm 1 norm 2 normSparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 19
• Relative Merits OMP (P1) Computational Cost X Ease of Implementation X Eﬀectiveness XSparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 20
• When do the algorithms work?Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 21
• Key Insight Sparse representation is tractable when the matrix Φ is suﬃciently nice (More precisely, column submatrices of the matrix should be well conditioned)Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 22
• Quantifying Nicenessl We say Φ is incoherent when 1 max | ϕj , ϕk | ≤ √ j=k dl Incoherent matrices appear often in signal processing applicationsl We call Φ a tight frame when N ΦΦT = I dl Tight frames have minimal spectral norm among conformal matricesNote: Both conditions can be relaxed substantiallySparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 23
• Example: Identity + Fourier 1 1/√d Impulses Complex Exponentials An incoherent tight frameSparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 24
• Finding Sparse SolutionsTheorem 3. [T 2004] Let Φ be incoherent. Suppose that the linearsystem Φx = b has a solution x that satisﬁes √ 1 x 0 < 2 ( d + 1).Then the vector x is1. the unique minimal 0 solution to the linear system, and2. the output of both OMP and 1 minimization.References: [Donoho–Huo 2001, Greed is Good, Just Relax]Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 25
• The Square-Root Threshold √l Sparse representations are not necessarily unique past the d thresholdExample: The Dirac Combl Consider the Identity + Fourier matrix with d = p2l There is a vector b that can be written as either p spikes or p sinesl By the Poisson summation formula, p−1 p−1 1 b(t) = δpj (t) = √ e−2πipjt/d for t = 0, 1, . . . , d j=0 d j=0Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 26
• Enter Probability Insight: The bad vectors are atypicall It is usually possible to identify random sparse vectorsl The next theorem is the ﬁrst step toward quantifying this intuitionSparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 27
• Conditioning of Random SubmatricesTheorem 4. [T 2006] Let Φ be an incoherent tight frame with at leasttwice as many columns as rows. Suppose that cd m ≤ . log dIf A is a random m-column submatrix of Φ then 1 Prob A∗ A − I < ≥ 99.44%. 2The number c is a positive absolute constant.Reference: [Random Subdictionaries]Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 28
• Recovering Random Sparse Vectors Model (M) for b = Φx The matrix Φ is an incoherent tight frame Nonzero entries of x number m ≤ cd/ log N have uniformly random positions are independent, zero-mean Gaussian RVsTheorem 5. [T 2006] Let b = Φx be a random vector drawn accordingto Model (M). Then x is1. the unique minimal 0 solution w.p. at least 99.44% and2. the unique minimal 1 solution w.p. at least 99.44%.Reference: [Random Subdictionaries]Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 29
• Methods of Proofl Functional success criterion for OMPl Duality results for convex optimizationl Banach algebra techniques for estimating matrix normsl Concentration of measure inequalitiesl Banach space methods for studying spectra of random matrices l Decoupling of dependent random variables l Symmetrization of random subset sums l Noncommutative Khintchine inequalities l Bounds for suprema of empirical processesSparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 30
• Compressive SamplingSparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 31
• Compressive Sampling Il In many applications, signals of interest have sparse representationsl Traditional methods acquire entire signal, then extract informationl Sparsity can be exploited when acquiring these signalsl Want number of samples proportional to amount of informationl Approach: Introduce randomness in the sampling procedurel Assumption: Each random sample has unit costSparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 32
• Compressive Sampling II 14.1 –2.6 –5.3 10.4 3.2 sparse signal ( ) linear measurement process data (b = Φx )l Given data b = Φx, must identify sparse signal xl This is a sparse representation problem with a random matrixReferences: [Cand`s–Romberg–Tao 2004, Donoho 2004] eSparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 33
• Compressive Sampling and OMPTheorem 6. [T, Gilbert 2005] Assume thatl x is a vector in RN with m nonzeros andl Φ is a d × N Gaussian matrix with d ≥ Cm log Nl Execute OMP with b = Φx to obtain estimate xThe estimate x equals the vector x with probability at least 99.44%.Reference: [Signal Recovery via OMP]Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 34
• Compressive Sampling with 1 MinimizationTheorem 7. [Various] Assume thatl Φ is a d × N Gaussian matrix with d ≥ Cm log(N/m)With probability 99.44%, the following statement holds.l Let x be a vector in RN with m nonzerosl Execute 1 minimization with b = Φx to obtain estimate xThe estimate x equals the vector x.References: [Cand`s et al. 2004–2006], [Donoho et al. 2004–2006], e[Rudelson–Vershynin 2006]Sparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 35
• Related DirectionsSparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 36
• Sublinear Compressive Samplingl There are algorithms that can recover sparse signals from random measurements in time proportional to the number of measurementsl This is an exponential speedup over OMP and 1 minimizationl The cost is a logarithmic number of additional measurementsReferences: [Algorithmic dimension reduction, One sketch for all]Joint with Gilbert, Strauss, VershyninSparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 37
• Simultaneous Sparsityl In some applications, one seeks solutions to the matrix equation ΦX = B where X has a minimal number of nonzero rowsl We have studied algorithms for this problemReferences: [Simultaneous Sparse Approximation I and II ]Joint with Gilbert, StraussSparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 38
• Projective Packingsl The coherence statistic plays an important role in sparse representationl What can we say about matrices Φ with minimal coherence?l Equivalent to studying packing in projective spacel We have theory about when optimal packings can existl We have numerical algorithms for constructing packingsReferences: [Existence of ETFs, Constructing Structured TFs, . . . ]Joint with Dhillon, Heath, Sra, Strohmer, SustikSparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 39
• To learn more...Web: http://www.umich.edu/~jtroppE-mail: jtropp@umich.eduPartial List of Papersl “Greed is good,” Trans. IT, 2004l “Constructing structured tight frames,” Trans. IT, 2005l “Just relax,” Trans. IT, 2006l “Simultaneous sparse approximation I and II,” J. Signal Process., 2006l “One sketch for all,” to appear, STOC 2007l “Existence of equiangular tight frames,” submitted, 2004l “Signal recovery from random measurements via OMP,” submitted, 2005l “Algorithmic dimension reduction,” submitted, 2006l “Random subdictionaries,” submitted, 2006l “Constructing packings in Grassmannian manifolds,” submitted, 2006Coauthors: Dhillon, Gilbert, Heath, Muthukrishnan, Rice DSP, Sra, Strauss,Strohmer, Sustik, VershyninSparse Representations (Numerical Analysis Seminar, NYU, 20 April 2007) 40