Upcoming SlideShare
×

Learning Sparse Representation

3,571 views
3,294 views

Published on

Slides of the keynote presentation at the conference ADA7, Cargese, France, 14-18 May 2012.

Published in: Education
5 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
3,571
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
854
0
Likes
5
Embeds 0
No embeds

No notes for slide

Learning Sparse Representation

1. 1. Learning SparseRepresentationsGabriel Peyré www.numerical-tours.com
2. 2. Image PriorsMathematical image prior: compression, denoising, super-resolution, . . .
3. 3. Image PriorsMathematical image prior: compression, denoising, super-resolution, . . .Smooth images: Sobolev prior: || f ||2 Low-pass Fourier coe cients.
4. 4. Image PriorsMathematical image prior: compression, denoising, super-resolution, . . .Smooth images: Sobolev prior: || f ||2 Low-pass Fourier coe cients.Piecewise smooth images: Total variation prior: || f || Sparse wavelets coe cients.
5. 5. Image PriorsMathematical image prior: compression, denoising, super-resolution, . . .Smooth images: Sobolev prior: || f ||2 Low-pass Fourier coe cients.Piecewise smooth images: Total variation prior: || f || Sparse wavelets coe cients. Learning the prior from exemplars?
6. 6. Overview•Sparsity and Redundancy•Dictionary Learning•Extensions•Task-driven Learning•Texture Synthesis
7. 7. Image Representation Q 1Dictionary D = {dm }m=0 of atoms dm RN . Q 1Image decomposition: f = xm dm = Dx m=0 dm xm dm = f D x
8. 8. Image Representation Q 1Dictionary D = {dm }m=0 of atoms dm RN . Q 1Image decomposition: f = xm dm = Dx m=0 dmImage approximation: f Dx xm dm = f D x
9. 9. Image Representation Q 1Dictionary D = {dm }m=0 of atoms dm RN . Q 1Image decomposition: f = xm dm = Dx m=0 dmImage approximation: f Dx xm dmOrthogonal dictionary: N = Q = xm = f, dm f D x
10. 10. Image Representation Q 1Dictionary D = {dm }m=0 of atoms dm RN . Q 1Image decomposition: f = xm dm = Dx m=0 dmImage approximation: f Dx xm dmOrthogonal dictionary: N = Q = xm = f, dm f D xRedundant dictionary: N Q Examples: TI wavelets, curvelets, . . . x is not unique.
11. 11. Sparsity Q 1Decomposition: f= xm dm = Dx m=0Sparsity: most xm are small. Example: wavelet transform. Image f Coe cients x
12. 12. Sparsity Q 1Decomposition: f= xm dm = Dx m=0Sparsity: most xm are small. Example: wavelet transform. Image fIdeal sparsity: most xm are zero. J0 (x) = | {m xm = 0} | Coe cients x
13. 13. Sparsity Q 1Decomposition: f= xm dm = Dx m=0Sparsity: most xm are small. Example: wavelet transform. Image fIdeal sparsity: most xm are zero. J0 (x) = | {m xm = 0} |Approximate sparsity: compressibility ||f Dx|| is small with J0 (x) M. Coe cients x
14. 14. Sparse Coding Q 1Redundant dictionary D = {dm }m=0 , Q N. non-unique representation f = Dx.Sparsest decomposition: min J0 (x) f =Dx
15. 15. Sparse Coding Q 1Redundant dictionary D = {dm }m=0 , Q N. non-unique representation f = Dx.Sparsest decomposition: min J0 (x) f =Dx 1Sparsest approximation: min ||f Dx|| + J0 (x) 2 x 2 Equivalence min ||f Dx|| M ⇥ J0 (x) M min J0 (x) ||f Dx||
16. 16. Sparse Coding Q 1Redundant dictionary D = {dm }m=0 , Q N. non-unique representation f = Dx.Sparsest decomposition: min J0 (x) f =Dx 1Sparsest approximation: min ||f Dx|| + J0 (x) 2 x 2 Equivalence min ||f Dx|| M ⇥ J0 (x) M min J0 (x) ||f Dx||Ortho-basis D: ⇤ Pick the M largest f, dm ⇥ if |xm | 2 coe cients xm = 0 otherwise. in { f, dm ⇥}m
17. 17. Sparse Coding Q 1Redundant dictionary D = {dm }m=0 , Q N. non-unique representation f = Dx.Sparsest decomposition: min J0 (x) f =Dx 1Sparsest approximation: min ||f Dx|| + J0 (x) 2 x 2 Equivalence min ||f Dx|| M ⇥ J0 (x) M min J0 (x) ||f Dx||Ortho-basis D: ⇤ Pick the M largest f, dm ⇥ if |xm | 2 coe cients xm = 0 otherwise. in { f, dm ⇥}mGeneral redundant dictionary: NP-hard.
18. 18. Convex Relaxation: L1 Prior J0 (x) = | {m xm = 0} | J0 (x) = 0 null image.Image with 2 pixels: J0 (x) = 1 sparse image. J0 (x) = 2 non-sparse image. d1 d0 q=0
19. 19. Convex Relaxation: L1 Prior J0 (x) = | {m xm = 0} | J0 (x) = 0 null image.Image with 2 pixels: J0 (x) = 1 sparse image. J0 (x) = 2 non-sparse image. d1 d0 q=0 q = 1/2 q=1 q = 3/2 q=2 q priors: Jq (x) = |xm |q (convex for q 1) m
20. 20. Convex Relaxation: L1 Prior J0 (x) = | {m xm = 0} | J0 (x) = 0 null image.Image with 2 pixels: J0 (x) = 1 sparse image. J0 (x) = 2 non-sparse image. d1 d0 q=0 q = 1/2 q=1 q = 3/2 q=2 q priors: Jq (x) = |xm |q (convex for q 1) mSparse 1 prior: J1 (x) = ||x||1 = |xm | m
21. 21. Inverse ProblemsDenoising/approximation: = Id.
22. 22. Inverse ProblemsDenoising/approximation: = Id.Examples: Inpainting, super-resolution, compressed-sensing
23. 23. Regularized InversionDenoising/compression: y = f0 + w RN . Sparse approximation: f = Dx where 1 x ⇥ argmin ||y Dx||2 + ||x||1 x 2 Fidelity
24. 24. Regularized InversionDenoising/compression: y = f0 + w RN . Sparse approximation: f = Dx where 1 x ⇥ argmin ||y Dx||2 + ||x||1 x 2 Fidelity Replace D by DInverse problems y = f0 + w RP . 1 x ⇥ argmin ||y Dx|| + ||x||1 2 x 2
25. 25. Regularized InversionDenoising/compression: y = f0 + w RN . Sparse approximation: f = Dx where 1 x ⇥ argmin ||y Dx||2 + ||x||1 x 2 Fidelity Replace D by DInverse problems y = f0 + w RP . 1 x ⇥ argmin ||y Dx|| + ||x||1 2 x 2Numerical solvers: proximal splitting schemes. www.numerical-tours.com
26. 26. Inpainting Results
27. 27. Overview•Sparsity and Redundancy•Dictionary Learning•Extensions•Task-driven Learning•Texture Synthesis
28. 28. Dictionary Learning: MAP EnergySet of (noisy) exemplars {yk }k . 1Sparse approximation: min ||yk Dxk || + ||xk ||1 2 xk 2
29. 29. Dictionary Learning: MAP EnergySet of (noisy) exemplars {yk }k . 1Sparse approximation: min min ||yk Dxk || + ||xk ||1 2 D C xk k 2 Dictionary learning
30. 30. Dictionary Learning: MAP EnergySet of (noisy) exemplars {yk }k . 1Sparse approximation: min min ||yk Dxk || + ||xk ||1 2 D C xk k 2 Dictionary learning Constraint: C = {D = (dm )m m, ||dm || 1} Otherwise: D + , X 0
31. 31. Dictionary Learning: MAP EnergySet of (noisy) exemplars {yk }k . 1Sparse approximation: min min ||yk Dxk || + ||xk ||1 2 D C xk k 2 Dictionary learning Constraint: C = {D = (dm )m m, ||dm || 1} Otherwise: D + , X 0Matrix formulation: 1 min f (X, D) = ||Y DX|| + ||X||1 2 X⇥R Q K 2 D⇥C RN Q
32. 32. Dictionary Learning: MAP EnergySet of (noisy) exemplars {yk }k . 1Sparse approximation: min min ||yk Dxk || + ||xk ||1 2 D C xk k 2 Dictionary learning Constraint: C = {D = (dm )m m, ||dm || 1} Otherwise: D + , X 0Matrix formulation: 1 min f (X, D) = ||Y DX|| + ||X||1 2 X⇥R Q K 2 min f (X, D) D⇥C R N Q X Convex with respect to X. Convex with respect to D. D Non-onvex with respect to (X, D). Local minima
33. 33. Dictionary Learning: AlgorithmStep 1: k, minimization on xk 1 min ||yk Dxk || + ||xk ||1 2 xk 2 Convex sparse coding. D, initialization
34. 34. Dictionary Learning: AlgorithmStep 1: k, minimization on xk 1 min ||yk Dxk || + ||xk ||1 2 xk 2 Convex sparse coding.Step 2: Minimization on D D, initialization min ||Y DX|| 2 D C Convex constraint minimization.
35. 35. Dictionary Learning: Algorithm Step 1: k, minimization on xk 1 min ||yk Dxk || + ||xk ||1 2 xk 2 Convex sparse coding. Step 2: Minimization on D D, initialization min ||Y DX|| 2 D C Convex constraint minimization. Projected gradient descent:D ( +1) = ProjC D ( ) (D ( ) X Y )X
36. 36. Dictionary Learning: Algorithm Step 1: k, minimization on xk 1 min ||yk Dxk || + ||xk ||1 2 xk 2 Convex sparse coding. Step 2: Minimization on D D, initialization min ||Y DX|| 2 D C Convex constraint minimization. Projected gradient descent:D ( +1) = ProjC D ( ) (D ( ) X Y )XConvergence: toward a stationary point of f (X, D). D, convergence
37. 37. Patch-based Learning Learning DExemplar patches yk Dictionary D [Olshausen, Fields 1997] State of the art denoising [Elad et al. 2006]
38. 38. Patch-based Learning Learning DExemplar patches yk Dictionary D [Olshausen, Fields 1997] State of the art denoising [Elad et al. 2006] Learning D Sparse texture synthesis, inpainting [Peyr´ 2008] e
39. 39. Comparison with PCAPCA dimensionality reduction: ⇥ k, min ||Y D(k) X|| D (k) = (dm )m=0 k 1 DLinear (PCA): Fourier-like atoms. RUBINSTEIN et al.: al.: DICTIONARIES FOR SPARSE REPRESENTATION RUBINSTEIN et DICTIONARIES FOR SPARSE REPRESENTATION 1980 by by Bast 1980 Bastiaa fundamental prop fundamental p A basic 1-D G A basic 1-D forms forms © © G = = n, G ⇤ ⇤ DCT PCA where w(·) is is where w(·) a Fig. 1. 1.Left: A fewfew £ 12 12 DCT atoms. Right: The ﬁrst 40 KLT atoms, (typically a Gau Fig. Left: A 12 12 £ DCT atoms. Right: The ﬁrst 40 KLT atoms, (typically a G trained using 12 £ 12 12 image patches from Lena. trained using 12 £ image patches from Lena. frequency resolu frequency reso matical foundatio matical founda late 1980’s by by late 1980’s D B. B. Non-Linear Revolution and Elements Modern Dictionary Non-Linear Revolution and Elements of of Modern Dictionary who studied thet who studied Design Design and by by Feichti and Feichting In In statistics research, the 1980’s saw the rise of new generalized group statistics research, the 1980’s saw the rise of a a new generalized gro powerful approach known as as robust statistics. Robust statistics powerful approach known robust statistics. Robust statistics
40. 40. Comparison with PCAPCA dimensionality reduction: ⇥ k, min ||Y D(k) X|| D (k) = (dm )m=0 k 1 DLinear (PCA): Fourier-like atoms. RUBINSTEIN et al.: al.: DICTIONARIES FOR SPARSE REPRESENTATION RUBINSTEIN et DICTIONARIES FOR SPARSE REPRESENTATIONSparse (learning): Gabor-like atoms. 1980 by by Bast 1980 Bastiaa fundamental prop fundamental p A basic 1-D G A basic 1-D forms forms © © 4 G = = n, G ⇤ ⇤ 4 DCT PCA where w(·) is is where w(·) a Fig. 1. 1.Left: A fewfew £ 12 12 DCT atoms. Right: The ﬁrst 40 KLT atoms, (typically a Gau Fig. Left: A 12 12 £ DCT atoms. Right: The ﬁrst 40 KLT atoms, 0.15 (typically a G 0.15 trained using 12 £ 12 12 image patches from Lena. trained using 12 £ image patches from Lena. frequency resolu 0.1 frequency reso 0.1 matical foundatio matical founda 0.05 0.05 0 0 late 1980’s by by late 1980’s D B. B. Non-Linear Revolution and Elements Modern Dictionary Non-Linear Revolution and Elements of of Modern Dictionary -0.05 -0.05 who studied thet who studied Design Design -0.1 -0.1 and by by Feichti -0.15 and Feichting -0.15 In In statistics research, the 1980’s saw the rise of new generalized group statistics research, the 1980’s saw the rise of a a new generalized gro -0.2 -0.2 Gabor Learned powerful approach known as as robust statistics. Robust statistics powerful approach known robust statistics. Robust statistics
41. 41. Patch-based DenoisingNoisy image: f = f0 + w.Step 1: Extract patches. yk (·) = f (zk + ·) yk[Aharon & Elad 2006]
42. 42. Patch-based DenoisingNoisy image: f = f0 + w.Step 1: Extract patches. yk (·) = f (zk + ·)Step 2: Dictionary learning. 1 min ||yk Dxk || + ||xk ||1 2 D,(xk )k 2 k yk[Aharon & Elad 2006]
43. 43. Patch-based DenoisingNoisy image: f = f0 + w.Step 1: Extract patches. yk (·) = f (zk + ·)Step 2: Dictionary learning. 1 min ||yk Dxk || + ||xk ||1 2 D,(xk )k 2 kStep 3: Patch averaging. yk = Dxk ˜ ˜ f (·) ⇥ yk (· zk ) ˜ k yk ˜ yk[Aharon & Elad 2006]
44. 44. Learning with Missing DataInverse problem: y = f0 + w LEARNING MULTISCALE AND S 1 1 min ||y f || + 2 ||pk (f ) Dxk || + ⇥||xk ||1 2f,(xk )k 2 2 kD C f0Patch extractor: pk (f ) = f (zk + ·) pk LEARNING MULTISCALE AND SPARSE REPRESENTATIONS 237 (a) Original y (a) Original (b) Damaged
45. 45. Learning with Missing DataInverse problem: y = f0 + w LEARNING MULTISCALE AND S 1 1 min ||y f || + 2 ||pk (f ) Dxk || + ⇥||xk ||1 2f,(xk )k 2 2 kD C f0Patch extractor: pk (f ) = f (zk + ·) pk Step 1: k, minimization on xk LEARNING MULTISCALE AND SPARSE REPRESENTATIONS 237 Convex sparse coding. (a) Original y (a) Original (b) Damaged
46. 46. Learning with Missing DataInverse problem: y = f0 + w LEARNING MULTISCALE AND S 1 1 min ||y f || + 2 ||pk (f ) Dxk || + ⇥||xk ||1 2f,(xk )k 2 2 kD C f0Patch extractor: pk (f ) = f (zk + ·) pk Step 1: k, minimization on xk LEARNING MULTISCALE AND SPARSE REPRESENTATIONS 237 Convex sparse coding. Step 2: Minimization on D (a) Original y Quadratic constrained. (a) Original (b) Damaged
47. 47. Learning with Missing DataInverse problem: y = f0 + w LEARNING MULTISCALE AND S 1 1 min ||y f || + 2 ||pk (f ) Dxk || + ⇥||xk ||1 2f,(xk )k 2 2 kD C f0Patch extractor: pk (f ) = f (zk + ·) pk Step 1: k, minimization on xk LEARNING MULTISCALE AND SPARSE REPRESENTATIONS 237 Convex sparse coding. Step 2: Minimization on D (a) Original y Quadratic constrained. Step 3: Minimization on f Quadratic. (a) Original (b) Damaged
48. 48. Inpainting Example LEARNING MULTISCALE AND SPARSE REPRESENTATIONS LEARNING MULTISCALE AND SPARSE REPRESENTATIONS 237 237 (a) Original (b) Damaged Image f0 (a) Original (a) Original Observations (c) Restored, N = 1 (b) Damaged (b) Damaged Regularized f (d) Restored, N = 2 y = using + w Fig. 14. Inpainting f0 N = 2 and n = 16 × 16 (bottom-right image), or N = 1 and n = 8 × 8 (bottom-left). J = 100 iterations were performed, producing an adaptive dictionary. During the learning, 50% of the patches were used. A sparsity factor L = 10 has been used during the learning process and L = 25 for the ﬁnal reconstruction. The damaged image was created by removing 75% of the data from the original image. The initial PSNR is 6.13dB. The resulting PSNR for N = 2 is[Mairal et al. 2008] 33.97dB and 31.75dB for N = 1.
49. 49. Adaptive Inpainting and Separation Wavelets Local DCT Wavelets Local DCT Learned [Peyr´, Fadili, Starck 2010] e
50. 50. Overview•Sparsity and Redundancy•Dictionary Learning•Extensions•Task-driven Learning•Texture Synthesis
51. 51. OISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR TS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 D DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPRO SENTATION FOR COLOR IMAGE RESTORATION EST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUC K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT AR s are reduced with our proposed technique ( Higher Dimensional Learningmples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric). in our proposed new metric). Both images have been denoised with the same global dictionary.bserves a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when MAIRAL rs), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. dB. (c) Proposed algorithm, et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATIONnaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms. can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.OLOR IMAGE RESTORATION 61 Fig. 7. Data set used for evaluating denoising experiments. Learning D (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one. TABLE I Fig. 7. Data set used for evaluating denoising experiments. les of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposedatoms learned on new metric). Fig. 2. Dictionaries with 256 ( in the a generic database of natural images, with two dare reduced with our proposed technique ( TABLE I our proposed new metric). Both images have been denoised negative values, the vectors are presented scaled and shifted to the [0,2 in Since the atoms can have with the same global dictionary.ervesITH 256 ATOMS OF SIZE castle and in3 FOR of the water. What is more, the color of the sky is . EACH constant IS DIVIDED IN FOUR W a bias effect in the color from the 7 7 some part AND 6 6 3 FOR piecewise CASE whenEN BY is another artifact our approach corrected. (a)HEIR “3(b) Original algorithm, HE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY), which MCAULEY AND AL [28] WITH T Original. 3 MODEL.” T dB. (c) Proposed algorithm, 3 MODEL.” THE TOP-RIGHT RESUL dB. AND 6M [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS 6 3 FOR . EAC
52. 52. OISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR TS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 D DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPRO SENTATION FOR COLOR IMAGE RESTORATION EST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUC K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT AR Higher Dimensional Learning O NLINE L EARNING FOR M ATRIX FACTORIZATION AND FACTORIZATION AND S PARSE C ODING O NLINE L EARNING FOR M ATRIX S PARSE C ODINGmples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( s are reduced with our proposed technique ( in the new metric). in our proposed new metric). Both images have been denoised with the same global dictionary.bserves a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when MAIRAL rs), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. dB. (c) Proposed algorithm, et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATIONnaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms. can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.OLOR IMAGE RESTORATION 61 Fig. 7. Data set used for evaluating denoising experiments. Learning D (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one. TABLE I Fig. 7. Data set used for evaluating denoising experiments. les of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposedatoms learned on new metric). Fig. 2. Dictionaries with 256 ( in the a generic database of natural images, with two dare reduced with our proposed technique ( TABLE I our proposed new metric). Both images have been denoised negative values, the vectors are presented scaled and shifted to the [0,2 in Since the atoms can have with the same global dictionary.ervesITH 256 ATOMS OF SIZE castle and in3 FOR of the water. What is more, the color of the sky is . EACH constant IS DIVIDED IN FOUR W a bias effect in the color from the 7 7 some part AND 6 6 3 FOR piecewise CASE whenEN BY is another artifact our approach corrected. (a)HEIR “3(b) Original algorithm, HE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY), which MCAULEY AND AL [28] WITH T Original. 3 MODEL.” T dB. (c) Proposed algorithm, 3 MODEL.” THE TOP-RIGHT RESUL dB. AND 6M [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS. InpaintingEACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS 6 3 FOR . EAC
53. 53. Movie Inpainting