Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Sparse representation and compressi... by Indian Institute ... 7742 views
- Multiplication of two 3 d sparse ma... by Dr Sandeep Kumar ... 2489 views
- Sparse matrices by Zain Zafar 2833 views
- Introduction to compressive sensing by Mohammed Musfir N N 3212 views
- Sparse matrices by Jonghoon Park 949 views
- Dictionary Learning in Games - GDC ... by Manchor Ko 9026 views

3,571 views

3,294 views

3,294 views

Published on

Slides of the keynote presentation at the conference ADA7, Cargese, France, 14-18 May 2012.

Published in:
Education

No Downloads

Total views

3,571

On SlideShare

0

From Embeds

0

Number of Embeds

7

Shares

0

Downloads

854

Comments

0

Likes

5

No embeds

No notes for slide

- 1. Learning SparseRepresentationsGabriel Peyré www.numerical-tours.com
- 2. Image PriorsMathematical image prior: compression, denoising, super-resolution, . . .
- 3. Image PriorsMathematical image prior: compression, denoising, super-resolution, . . .Smooth images: Sobolev prior: || f ||2 Low-pass Fourier coe cients.
- 4. Image PriorsMathematical image prior: compression, denoising, super-resolution, . . .Smooth images: Sobolev prior: || f ||2 Low-pass Fourier coe cients.Piecewise smooth images: Total variation prior: || f || Sparse wavelets coe cients.
- 5. Image PriorsMathematical image prior: compression, denoising, super-resolution, . . .Smooth images: Sobolev prior: || f ||2 Low-pass Fourier coe cients.Piecewise smooth images: Total variation prior: || f || Sparse wavelets coe cients. Learning the prior from exemplars?
- 6. Overview•Sparsity and Redundancy•Dictionary Learning•Extensions•Task-driven Learning•Texture Synthesis
- 7. Image Representation Q 1Dictionary D = {dm }m=0 of atoms dm RN . Q 1Image decomposition: f = xm dm = Dx m=0 dm xm dm = f D x
- 8. Image Representation Q 1Dictionary D = {dm }m=0 of atoms dm RN . Q 1Image decomposition: f = xm dm = Dx m=0 dmImage approximation: f Dx xm dm = f D x
- 9. Image Representation Q 1Dictionary D = {dm }m=0 of atoms dm RN . Q 1Image decomposition: f = xm dm = Dx m=0 dmImage approximation: f Dx xm dmOrthogonal dictionary: N = Q = xm = f, dm f D x
- 10. Image Representation Q 1Dictionary D = {dm }m=0 of atoms dm RN . Q 1Image decomposition: f = xm dm = Dx m=0 dmImage approximation: f Dx xm dmOrthogonal dictionary: N = Q = xm = f, dm f D xRedundant dictionary: N Q Examples: TI wavelets, curvelets, . . . x is not unique.
- 11. Sparsity Q 1Decomposition: f= xm dm = Dx m=0Sparsity: most xm are small. Example: wavelet transform. Image f Coe cients x
- 12. Sparsity Q 1Decomposition: f= xm dm = Dx m=0Sparsity: most xm are small. Example: wavelet transform. Image fIdeal sparsity: most xm are zero. J0 (x) = | {m xm = 0} | Coe cients x
- 13. Sparsity Q 1Decomposition: f= xm dm = Dx m=0Sparsity: most xm are small. Example: wavelet transform. Image fIdeal sparsity: most xm are zero. J0 (x) = | {m xm = 0} |Approximate sparsity: compressibility ||f Dx|| is small with J0 (x) M. Coe cients x
- 14. Sparse Coding Q 1Redundant dictionary D = {dm }m=0 , Q N. non-unique representation f = Dx.Sparsest decomposition: min J0 (x) f =Dx
- 15. Sparse Coding Q 1Redundant dictionary D = {dm }m=0 , Q N. non-unique representation f = Dx.Sparsest decomposition: min J0 (x) f =Dx 1Sparsest approximation: min ||f Dx|| + J0 (x) 2 x 2 Equivalence min ||f Dx|| M ⇥ J0 (x) M min J0 (x) ||f Dx||
- 16. Sparse Coding Q 1Redundant dictionary D = {dm }m=0 , Q N. non-unique representation f = Dx.Sparsest decomposition: min J0 (x) f =Dx 1Sparsest approximation: min ||f Dx|| + J0 (x) 2 x 2 Equivalence min ||f Dx|| M ⇥ J0 (x) M min J0 (x) ||f Dx||Ortho-basis D: ⇤ Pick the M largest f, dm ⇥ if |xm | 2 coe cients xm = 0 otherwise. in { f, dm ⇥}m
- 17. Sparse Coding Q 1Redundant dictionary D = {dm }m=0 , Q N. non-unique representation f = Dx.Sparsest decomposition: min J0 (x) f =Dx 1Sparsest approximation: min ||f Dx|| + J0 (x) 2 x 2 Equivalence min ||f Dx|| M ⇥ J0 (x) M min J0 (x) ||f Dx||Ortho-basis D: ⇤ Pick the M largest f, dm ⇥ if |xm | 2 coe cients xm = 0 otherwise. in { f, dm ⇥}mGeneral redundant dictionary: NP-hard.
- 18. Convex Relaxation: L1 Prior J0 (x) = | {m xm = 0} | J0 (x) = 0 null image.Image with 2 pixels: J0 (x) = 1 sparse image. J0 (x) = 2 non-sparse image. d1 d0 q=0
- 19. Convex Relaxation: L1 Prior J0 (x) = | {m xm = 0} | J0 (x) = 0 null image.Image with 2 pixels: J0 (x) = 1 sparse image. J0 (x) = 2 non-sparse image. d1 d0 q=0 q = 1/2 q=1 q = 3/2 q=2 q priors: Jq (x) = |xm |q (convex for q 1) m
- 20. Convex Relaxation: L1 Prior J0 (x) = | {m xm = 0} | J0 (x) = 0 null image.Image with 2 pixels: J0 (x) = 1 sparse image. J0 (x) = 2 non-sparse image. d1 d0 q=0 q = 1/2 q=1 q = 3/2 q=2 q priors: Jq (x) = |xm |q (convex for q 1) mSparse 1 prior: J1 (x) = ||x||1 = |xm | m
- 21. Inverse ProblemsDenoising/approximation: = Id.
- 22. Inverse ProblemsDenoising/approximation: = Id.Examples: Inpainting, super-resolution, compressed-sensing
- 23. Regularized InversionDenoising/compression: y = f0 + w RN . Sparse approximation: f = Dx where 1 x ⇥ argmin ||y Dx||2 + ||x||1 x 2 Fidelity
- 24. Regularized InversionDenoising/compression: y = f0 + w RN . Sparse approximation: f = Dx where 1 x ⇥ argmin ||y Dx||2 + ||x||1 x 2 Fidelity Replace D by DInverse problems y = f0 + w RP . 1 x ⇥ argmin ||y Dx|| + ||x||1 2 x 2
- 25. Regularized InversionDenoising/compression: y = f0 + w RN . Sparse approximation: f = Dx where 1 x ⇥ argmin ||y Dx||2 + ||x||1 x 2 Fidelity Replace D by DInverse problems y = f0 + w RP . 1 x ⇥ argmin ||y Dx|| + ||x||1 2 x 2Numerical solvers: proximal splitting schemes. www.numerical-tours.com
- 26. Inpainting Results
- 27. Overview•Sparsity and Redundancy•Dictionary Learning•Extensions•Task-driven Learning•Texture Synthesis
- 28. Dictionary Learning: MAP EnergySet of (noisy) exemplars {yk }k . 1Sparse approximation: min ||yk Dxk || + ||xk ||1 2 xk 2
- 29. Dictionary Learning: MAP EnergySet of (noisy) exemplars {yk }k . 1Sparse approximation: min min ||yk Dxk || + ||xk ||1 2 D C xk k 2 Dictionary learning
- 30. Dictionary Learning: MAP EnergySet of (noisy) exemplars {yk }k . 1Sparse approximation: min min ||yk Dxk || + ||xk ||1 2 D C xk k 2 Dictionary learning Constraint: C = {D = (dm )m m, ||dm || 1} Otherwise: D + , X 0
- 31. Dictionary Learning: MAP EnergySet of (noisy) exemplars {yk }k . 1Sparse approximation: min min ||yk Dxk || + ||xk ||1 2 D C xk k 2 Dictionary learning Constraint: C = {D = (dm )m m, ||dm || 1} Otherwise: D + , X 0Matrix formulation: 1 min f (X, D) = ||Y DX|| + ||X||1 2 X⇥R Q K 2 D⇥C RN Q
- 32. Dictionary Learning: MAP EnergySet of (noisy) exemplars {yk }k . 1Sparse approximation: min min ||yk Dxk || + ||xk ||1 2 D C xk k 2 Dictionary learning Constraint: C = {D = (dm )m m, ||dm || 1} Otherwise: D + , X 0Matrix formulation: 1 min f (X, D) = ||Y DX|| + ||X||1 2 X⇥R Q K 2 min f (X, D) D⇥C R N Q X Convex with respect to X. Convex with respect to D. D Non-onvex with respect to (X, D). Local minima
- 33. Dictionary Learning: AlgorithmStep 1: k, minimization on xk 1 min ||yk Dxk || + ||xk ||1 2 xk 2 Convex sparse coding. D, initialization
- 34. Dictionary Learning: AlgorithmStep 1: k, minimization on xk 1 min ||yk Dxk || + ||xk ||1 2 xk 2 Convex sparse coding.Step 2: Minimization on D D, initialization min ||Y DX|| 2 D C Convex constraint minimization.
- 35. Dictionary Learning: Algorithm Step 1: k, minimization on xk 1 min ||yk Dxk || + ||xk ||1 2 xk 2 Convex sparse coding. Step 2: Minimization on D D, initialization min ||Y DX|| 2 D C Convex constraint minimization. Projected gradient descent:D ( +1) = ProjC D ( ) (D ( ) X Y )X
- 36. Dictionary Learning: Algorithm Step 1: k, minimization on xk 1 min ||yk Dxk || + ||xk ||1 2 xk 2 Convex sparse coding. Step 2: Minimization on D D, initialization min ||Y DX|| 2 D C Convex constraint minimization. Projected gradient descent:D ( +1) = ProjC D ( ) (D ( ) X Y )XConvergence: toward a stationary point of f (X, D). D, convergence
- 37. Patch-based Learning Learning DExemplar patches yk Dictionary D [Olshausen, Fields 1997] State of the art denoising [Elad et al. 2006]
- 38. Patch-based Learning Learning DExemplar patches yk Dictionary D [Olshausen, Fields 1997] State of the art denoising [Elad et al. 2006] Learning D Sparse texture synthesis, inpainting [Peyr´ 2008] e
- 39. Comparison with PCAPCA dimensionality reduction: ⇥ k, min ||Y D(k) X|| D (k) = (dm )m=0 k 1 DLinear (PCA): Fourier-like atoms. RUBINSTEIN et al.: al.: DICTIONARIES FOR SPARSE REPRESENTATION RUBINSTEIN et DICTIONARIES FOR SPARSE REPRESENTATION 1980 by by Bast 1980 Bastiaa fundamental prop fundamental p A basic 1-D G A basic 1-D forms forms © © G = = n, G ⇤ ⇤ DCT PCA where w(·) is is where w(·) a Fig. 1. 1.Left: A fewfew £ 12 12 DCT atoms. Right: The ﬁrst 40 KLT atoms, (typically a Gau Fig. Left: A 12 12 £ DCT atoms. Right: The ﬁrst 40 KLT atoms, (typically a G trained using 12 £ 12 12 image patches from Lena. trained using 12 £ image patches from Lena. frequency resolu frequency reso matical foundatio matical founda late 1980’s by by late 1980’s D B. B. Non-Linear Revolution and Elements Modern Dictionary Non-Linear Revolution and Elements of of Modern Dictionary who studied thet who studied Design Design and by by Feichti and Feichting In In statistics research, the 1980’s saw the rise of new generalized group statistics research, the 1980’s saw the rise of a a new generalized gro powerful approach known as as robust statistics. Robust statistics powerful approach known robust statistics. Robust statistics
- 40. Comparison with PCAPCA dimensionality reduction: ⇥ k, min ||Y D(k) X|| D (k) = (dm )m=0 k 1 DLinear (PCA): Fourier-like atoms. RUBINSTEIN et al.: al.: DICTIONARIES FOR SPARSE REPRESENTATION RUBINSTEIN et DICTIONARIES FOR SPARSE REPRESENTATIONSparse (learning): Gabor-like atoms. 1980 by by Bast 1980 Bastiaa fundamental prop fundamental p A basic 1-D G A basic 1-D forms forms © © 4 G = = n, G ⇤ ⇤ 4 DCT PCA where w(·) is is where w(·) a Fig. 1. 1.Left: A fewfew £ 12 12 DCT atoms. Right: The ﬁrst 40 KLT atoms, (typically a Gau Fig. Left: A 12 12 £ DCT atoms. Right: The ﬁrst 40 KLT atoms, 0.15 (typically a G 0.15 trained using 12 £ 12 12 image patches from Lena. trained using 12 £ image patches from Lena. frequency resolu 0.1 frequency reso 0.1 matical foundatio matical founda 0.05 0.05 0 0 late 1980’s by by late 1980’s D B. B. Non-Linear Revolution and Elements Modern Dictionary Non-Linear Revolution and Elements of of Modern Dictionary -0.05 -0.05 who studied thet who studied Design Design -0.1 -0.1 and by by Feichti -0.15 and Feichting -0.15 In In statistics research, the 1980’s saw the rise of new generalized group statistics research, the 1980’s saw the rise of a a new generalized gro -0.2 -0.2 Gabor Learned powerful approach known as as robust statistics. Robust statistics powerful approach known robust statistics. Robust statistics
- 41. Patch-based DenoisingNoisy image: f = f0 + w.Step 1: Extract patches. yk (·) = f (zk + ·) yk[Aharon & Elad 2006]
- 42. Patch-based DenoisingNoisy image: f = f0 + w.Step 1: Extract patches. yk (·) = f (zk + ·)Step 2: Dictionary learning. 1 min ||yk Dxk || + ||xk ||1 2 D,(xk )k 2 k yk[Aharon & Elad 2006]
- 43. Patch-based DenoisingNoisy image: f = f0 + w.Step 1: Extract patches. yk (·) = f (zk + ·)Step 2: Dictionary learning. 1 min ||yk Dxk || + ||xk ||1 2 D,(xk )k 2 kStep 3: Patch averaging. yk = Dxk ˜ ˜ f (·) ⇥ yk (· zk ) ˜ k yk ˜ yk[Aharon & Elad 2006]
- 44. Learning with Missing DataInverse problem: y = f0 + w LEARNING MULTISCALE AND S 1 1 min ||y f || + 2 ||pk (f ) Dxk || + ⇥||xk ||1 2f,(xk )k 2 2 kD C f0Patch extractor: pk (f ) = f (zk + ·) pk LEARNING MULTISCALE AND SPARSE REPRESENTATIONS 237 (a) Original y (a) Original (b) Damaged
- 45. Learning with Missing DataInverse problem: y = f0 + w LEARNING MULTISCALE AND S 1 1 min ||y f || + 2 ||pk (f ) Dxk || + ⇥||xk ||1 2f,(xk )k 2 2 kD C f0Patch extractor: pk (f ) = f (zk + ·) pk Step 1: k, minimization on xk LEARNING MULTISCALE AND SPARSE REPRESENTATIONS 237 Convex sparse coding. (a) Original y (a) Original (b) Damaged
- 46. Learning with Missing DataInverse problem: y = f0 + w LEARNING MULTISCALE AND S 1 1 min ||y f || + 2 ||pk (f ) Dxk || + ⇥||xk ||1 2f,(xk )k 2 2 kD C f0Patch extractor: pk (f ) = f (zk + ·) pk Step 1: k, minimization on xk LEARNING MULTISCALE AND SPARSE REPRESENTATIONS 237 Convex sparse coding. Step 2: Minimization on D (a) Original y Quadratic constrained. (a) Original (b) Damaged
- 47. Learning with Missing DataInverse problem: y = f0 + w LEARNING MULTISCALE AND S 1 1 min ||y f || + 2 ||pk (f ) Dxk || + ⇥||xk ||1 2f,(xk )k 2 2 kD C f0Patch extractor: pk (f ) = f (zk + ·) pk Step 1: k, minimization on xk LEARNING MULTISCALE AND SPARSE REPRESENTATIONS 237 Convex sparse coding. Step 2: Minimization on D (a) Original y Quadratic constrained. Step 3: Minimization on f Quadratic. (a) Original (b) Damaged
- 48. Inpainting Example LEARNING MULTISCALE AND SPARSE REPRESENTATIONS LEARNING MULTISCALE AND SPARSE REPRESENTATIONS 237 237 (a) Original (b) Damaged Image f0 (a) Original (a) Original Observations (c) Restored, N = 1 (b) Damaged (b) Damaged Regularized f (d) Restored, N = 2 y = using + w Fig. 14. Inpainting f0 N = 2 and n = 16 × 16 (bottom-right image), or N = 1 and n = 8 × 8 (bottom-left). J = 100 iterations were performed, producing an adaptive dictionary. During the learning, 50% of the patches were used. A sparsity factor L = 10 has been used during the learning process and L = 25 for the ﬁnal reconstruction. The damaged image was created by removing 75% of the data from the original image. The initial PSNR is 6.13dB. The resulting PSNR for N = 2 is[Mairal et al. 2008] 33.97dB and 31.75dB for N = 1.
- 49. Adaptive Inpainting and Separation Wavelets Local DCT Wavelets Local DCT Learned [Peyr´, Fadili, Starck 2010] e
- 50. Overview•Sparsity and Redundancy•Dictionary Learning•Extensions•Task-driven Learning•Texture Synthesis
- 51. OISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR TS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 D DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPRO SENTATION FOR COLOR IMAGE RESTORATION EST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUC K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT AR s are reduced with our proposed technique ( Higher Dimensional Learningmples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric). in our proposed new metric). Both images have been denoised with the same global dictionary.bserves a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when MAIRAL rs), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. dB. (c) Proposed algorithm, et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATIONnaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms. can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.OLOR IMAGE RESTORATION 61 Fig. 7. Data set used for evaluating denoising experiments. Learning D (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one. TABLE I Fig. 7. Data set used for evaluating denoising experiments. les of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposedatoms learned on new metric). Fig. 2. Dictionaries with 256 ( in the a generic database of natural images, with two dare reduced with our proposed technique ( TABLE I our proposed new metric). Both images have been denoised negative values, the vectors are presented scaled and shifted to the [0,2 in Since the atoms can have with the same global dictionary.ervesITH 256 ATOMS OF SIZE castle and in3 FOR of the water. What is more, the color of the sky is . EACH constant IS DIVIDED IN FOUR W a bias effect in the color from the 7 7 some part AND 6 6 3 FOR piecewise CASE whenEN BY is another artifact our approach corrected. (a)HEIR “3(b) Original algorithm, HE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY), which MCAULEY AND AL [28] WITH T Original. 3 MODEL.” T dB. (c) Proposed algorithm, 3 MODEL.” THE TOP-RIGHT RESUL dB. AND 6M [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS 6 3 FOR . EAC
- 52. OISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR TS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 D DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPRO SENTATION FOR COLOR IMAGE RESTORATION EST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUC K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT AR Higher Dimensional Learning O NLINE L EARNING FOR M ATRIX FACTORIZATION AND FACTORIZATION AND S PARSE C ODING O NLINE L EARNING FOR M ATRIX S PARSE C ODINGmples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( s are reduced with our proposed technique ( in the new metric). in our proposed new metric). Both images have been denoised with the same global dictionary.bserves a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when MAIRAL rs), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. dB. (c) Proposed algorithm, et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATIONnaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms. can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.OLOR IMAGE RESTORATION 61 Fig. 7. Data set used for evaluating denoising experiments. Learning D (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one. TABLE I Fig. 7. Data set used for evaluating denoising experiments. les of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposedatoms learned on new metric). Fig. 2. Dictionaries with 256 ( in the a generic database of natural images, with two dare reduced with our proposed technique ( TABLE I our proposed new metric). Both images have been denoised negative values, the vectors are presented scaled and shifted to the [0,2 in Since the atoms can have with the same global dictionary.ervesITH 256 ATOMS OF SIZE castle and in3 FOR of the water. What is more, the color of the sky is . EACH constant IS DIVIDED IN FOUR W a bias effect in the color from the 7 7 some part AND 6 6 3 FOR piecewise CASE whenEN BY is another artifact our approach corrected. (a)HEIR “3(b) Original algorithm, HE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY), which MCAULEY AND AL [28] WITH T Original. 3 MODEL.” T dB. (c) Proposed algorithm, 3 MODEL.” THE TOP-RIGHT RESUL dB. AND 6M [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS. InpaintingEACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS 6 3 FOR . EAC
- 53. Movie Inpainting
- 54. Facial Image Compression O. Bryt, M. Elad / J. Vis. Commun. Image R. 19 (2008) 270–282 271 [Elad et al. 2009] show recognizable faces. We use a database containing around 6000 such facial images, some of which are used for training and tuning the algorithm, and the others for testing it, similar to the approachImage registration. taken in [17]. In our work we propose a novel compression algorithm, related to the one presented in [17], improving over it. Our algorithm relies strongly on recent advancements made in using sparse and redundant representation of signals [18–26], and learning their sparsifying dictionaries [27–29]. We use the K-SVD algorithm for learning the dictionaries for representing small image patches in a locally adaptive way, and use these to sparse- code the patches’ content. This is a relatively simple and straight-forward algorithm with hardly any entropy coding stage. Yet, it is shown to be superior to several competing algorithms: (i) the JPEG2000, (ii) the VQ-based algorithm presented in [17], and (iii) A Principal Component Analysis (PCA) approach.2 Fig. 1. (Left) Piece-wise afﬁne warping of the image by triangulation. (Right) A In the next section we provide some background material for uniform slicing to disjoint square patches for coding purposes. this work: we start by presenting the details of the compression algorithm developed in [17], as their scheme is the one we embark K-Means) per each patch separately, using patches taken from the from in the development of ours. We also describe the topic of same location from 5000 training images. This way, each VQ is sparse and redundant representations and the K-SVD, that are adapted to the expected local content, and thus the high perfor- the foundations for our algorithm. In Section 3 we turn to present mance presented by this algorithm. The number of code-words the proposed algorithm in details, showing its various steps, and in the VQ is a function of the bit-allocation for the patches. As discussing its computational/memory complexities. Section 4 we argue in the next section, VQ coding is limited by the available presents results of our method, demonstrating the claimed number of examples and the desired rate, forcing relatively small superiority. We conclude in Section 5 with a list of future activities patch sizes. This, in turn, leads to a loss of some redundancy be- that can further improve over the proposed scheme. tween adjacent patches, and thus loss of potential compression. Another ingredient in this algorithm that partly compensates 2. Background material for the above-described shortcoming is a multi-scale coding scheme. The image is scaled down and VQ-coded using patches 2.1. VQ-based image compression of size 8 Â 8. Then it is interpolated back to the original resolution, and the residual is coded using VQ on 8 Â 8 pixel patches once Among the thousands of papers that study still image again. This method can be applied on a Laplacian pyramid of the compression algorithms, there are relatively few that consider original (warped) image with several scales [33]. the treatment of facial images [2–17]. Among those, the most As already mentioned above, the results shown in [17] surpass recent and the best performing algorithm is the one reported in those obtained by JPEG2000, both visually and in Peak-Signal-to- [17]. That paper also provides a thorough literature survey that Noise Ratio (PSNR) quantitative comparisons. In our work we pro- compares the various methods and discusses similarities and pose to replace the coding stage from VQ to sparse and redundant differences between them. Therefore, rather than repeating such representations—this leads us to the next subsection, were we de- a survey here, we refer the interested reader to [17]. In this scribe the principles behind this coding strategy. sub-section we concentrate on the description of the algorithm in [17] as our method resembles it to some extent. 2.2. Sparse and redundant representations
- 55. Facial Image Compression O.O. Bryt, M. EladJ. J. Vis. Commun. Image R. 19 (2008) 270–282 Bryt, M. Elad / / Vis. Commun. Image R. 19 (2008) 270–282 271 271 [Elad et al. 2009] show recognizable faces. We use a a database containing around 6000 show recognizable faces. We use database containing around 6000 such facial images, some of which are used for training and tuning such facial images, some of which are used for training and tuning the algorithm, and the others for testing it, similar to the approach the algorithm, and the others for testing it, similar to the approachImage registration. taken in [17]. taken in [17]. In our work we propose a a novel compression algorithm, related In our work we propose novel compression algorithm, related to the one presented in [17], improving over it. to the one presented in [17], improving over it. Our algorithm relies strongly on recent advancements made in Our algorithm relies strongly on recent advancements made inNon-overlapping patches (fk )k . using sparse and redundant representation of signals [18–26], and using sparse and redundant representation of signals [18–26], and fk learning their sparsifying dictionaries [27–29]. We use the K-SVD learning their sparsifying dictionaries [27–29]. We use the K-SVD algorithm for learning the dictionaries for representing small algorithm for learning the dictionaries for representing small image patches in a a locally adaptive way, and use these to sparse- image patches in locally adaptive way, and use these to sparse- code the patches’ content. This isis a a relatively simple and code the patches’ content. This relatively simple and straight-forward algorithm with hardly any entropy coding stage. straight-forward algorithm with hardly any entropy coding stage. Yet, itit is shown to be superior to several competing algorithms: Yet, is shown to be superior to several competing algorithms: (i) the JPEG2000, (ii) the VQ-based algorithm presented in [17], (i) the JPEG2000, (ii) the VQ-based algorithm presented in [17], 2 and (iii) AA Principal Component Analysis (PCA) approach.2 and (iii) Principal Component Analysis (PCA) approach. Fig. 1.1. (Left) Piece-wise afﬁne warping of the image by triangulation. (Right) A Fig. (Left) Piece-wise afﬁne warping of the image by triangulation. (Right) A In the next section we provide some background material for In the next section we provide some background material for uniform slicing toto disjoint square patches for coding purposes. uniform slicing disjoint square patches for coding purposes. this work: we start by presenting the details of the compression this work: we start by presenting the details of the compression algorithm developed in [17], as their scheme isis the one we embark algorithm developed in [17], as their scheme the one we embark K-Means) per each patch separately, using patches taken from the K-Means) per each patch separately, using patches taken from the from in the development of ours. We also describe the topic of from in the development of ours. We also describe the topic of same location from 5000 training images. This way, each VQ isis same location from 5000 training images. This way, each VQ sparse and redundant representations and the K-SVD, that are sparse and redundant representations and the K-SVD, that are adapted to the expected local content, and thus the high perfor- adapted to the expected local content, and thus the high perfor- the foundations for our algorithm. In Section 3 3 we turn to present the foundations for our algorithm. In Section we turn to present mance presented by this algorithm. The number of code-words mance presented by this algorithm. The number of code-words the proposed algorithm in details, showing its various steps, and the proposed algorithm in details, showing its various steps, and in the VQ isisa afunction of the bit-allocation for the patches. As in the VQ function of the bit-allocation for the patches. As discussing its computational/memory complexities. Section 4 4 discussing its computational/memory complexities. Section we argue in the next section, VQ coding isis limited by the available we argue in the next section, VQ coding limited by the available presents results of our method, demonstrating the claimed presents results of our method, demonstrating the claimed number of examples and the desired rate, forcing relatively small number of examples and the desired rate, forcing relatively small superiority. We conclude in Section 5 5 with a list of future activities superiority. We conclude in Section with a list of future activities patch sizes. This, in turn, leads to a a loss of some redundancy be- patch sizes. This, in turn, leads to loss of some redundancy be- that can further improve over the proposed scheme. that can further improve over the proposed scheme. tween adjacent patches, and thus loss of potential compression. tween adjacent patches, and thus loss of potential compression. Another ingredient in this algorithm that partly compensates Another ingredient in this algorithm that partly compensates 2. Background material 2. Background material for the above-described shortcoming isis a a multi-scale coding for the above-described shortcoming multi-scale coding scheme. The image isisscaled down and VQ-coded using patches scheme. The image scaled down and VQ-coded using patches 2.1. VQ-based image compression 2.1. VQ-based image compression of size 8 8 Â 8. Then it is interpolated back to the original resolution, of size Â 8. Then it is interpolated back to the original resolution, and the residual isiscoded using VQ on 8 8 Â 8pixel patches once and the residual coded using VQ on Â 8 pixel patches once Among the thousands of papers that study still image Among the thousands of papers that study still image again. This method can be applied on a a Laplacian pyramid of the again. This method can be applied on Laplacian pyramid of the compression algorithms, there are relatively few that consider compression algorithms, there are relatively few that consider original (warped) image with several scales [33]. original (warped) image with several scales [33]. the treatment of facial images [2–17]. Among those, the most the treatment of facial images [2–17]. Among those, the most As already mentioned above, the results shown in [17] surpass As already mentioned above, the results shown in [17] surpass recent and the best performing algorithm isis the one reported in recent and the best performing algorithm the one reported in those obtained by JPEG2000, both visually and in Peak-Signal-to- those obtained by JPEG2000, both visually and in Peak-Signal-to- [17]. That paper also provides a athorough literature survey that [17]. That paper also provides thorough literature survey that Noise Ratio (PSNR) quantitative comparisons. In our work we pro- Noise Ratio (PSNR) quantitative comparisons. In our work we pro- compares the various methods and discusses similarities and compares the various methods and discusses similarities and pose to replace the coding stage from VQ to sparse and redundant pose to replace the coding stage from VQ to sparse and redundant differences between them. Therefore, rather than repeating such differences between them. Therefore, rather than repeating such representations—this leads us to the next subsection, were we de- representations—this leads us to the next subsection, were we de- a asurvey here, we refer the interested reader to [17]. In this survey here, we refer the interested reader to [17]. In this scribe the principles behind this coding strategy. scribe the principles behind this coding strategy. sub-section we concentrate on the description of the algorithm sub-section we concentrate on the description of the algorithm in [17] as our method resembles itit to some extent. in [17] as our method resembles to some extent. 2.2. Sparse and redundant representations 2.2. Sparse and redundant representations
- 56. Facial Image Compression O. Bryt, M. Elad / J. Vis. Commun. Image R. 19 (2008) 270–282 Before turning to preset the results we should add the follow- O.O. Bryt, M. EladJ. J. ing: while all theImage R. 19 (2008) 270–282 speciﬁc database Bryt, M. Elad / / Vis. Commun. Image R. shown here 270–282 Vis. Commun. results 19 (2008) refer to the was trained for patch number 80 (The left 271 coding atoms, and similarly, in Fig. 7 we271 can we operate on, the overall scheme proposed is general and should was trained for patch number 87 (The right apply to other face images databases just as well. Naturally, some sparse coding atoms. It can be seen that bot [Elad et al. 2009] show recognizable faces. We use a a database containing around 6000 the parameters might be necessary, and among those, show recognizable faces. We use database containing around 6000 in changes images similar in nature to the image patch such facial images, some of which are used for training and tuning size is the most important to consider. We also note that such facial images, some of which are used for training andthe patch tuning trained for. A similar behavior was observed the algorithm, and the others for testing it, similar to the approach from one source of images to another, this relative size as one shifts the algorithm, and the others for testing it, similar to the approach of the background in the photos may vary, and the necessarily 4.2. Reconstructed imagesImage registration. taken in [17]. taken in [17]. leads to changes in performance. More speciﬁcally, when the back- In our work we propose a a novel compression algorithm, ground small such larger (e.g., the images we useperformance is In our work we propose novel compression algorithm, related related tively regions are regions), the compression here have rela- Our coding strategy allows us to learn w age are more difﬁcult than others to co to the one presented in [17], improving over it. to the one presented in [17], improving over it. expected to improve. assigning the same representation error th Our algorithm relies strongly on recent advancements made in Our algorithm relies strongly on recent advancements made in dictionariesNon-overlapping patches (fk )k . 4.1. K-SVD using sparse and redundant representation of signals [18–26], and using sparse and redundant representation of signals [18–26], and fk patches, and observing how many atoms representation of each patch on average. a small number of allocated atoms are simp learning their sparsifying dictionaries [27–29]. We use the K-SVD learning their sparsifying dictionaries [27–29]. We use the K-SVDThe primary stopping condition for the training process was set others. We would expect that the represent to be a limitation on the maximal number of K-SVD iterations of the image such as the background, p algorithm for learning the dictionaries for representing (being 100). A secondary stopping condition was a limitation on algorithm for learning the dictionaries for representingsmall small maybe parts of the clothes will be simpler image patches in a a locally adaptive way, and use these to sparse- image patches in locally adaptive way, and use these to sparse- the minimal representation error. In the image compression stage tion of areas containing high frequency eDictionary learning (Dk )k . we added a limitation on the maximal number of atoms per patch. hair or the eyes. Fig. 8 shows maps of atom code the patches’ content. This isis a a relatively simple and code the patches’ content. This relatively simple and These conditions were used to allow us to better control the rates and representation error (RMSE—squared straight-forward algorithm with hardly any entropy coding of stage. stage. straight-forward algorithm with hardly any entropy coding the resulting images and the overall simulation time. squared error) per patch for the images in Every obtained dictionary contains 512 patches of size different bit-rates. It can be seen that more Yet, itit is shown to be superior to several competing algorithms: as atoms. In Fig. 6 we can see the dictionary that Yet, is shown to be superior to several competing algorithms:pixels 15 Â 15 to patches containing the facial details (h (i) the JPEG2000, (ii) the VQ-based algorithm presented in [17], (i) the JPEG2000, (ii) the VQ-based algorithm presented in [17], 2 and (iii) AA Principal Component Analysis (PCA) approach.2 and (iii) Principal Component Analysis (PCA) approach. Fig. 1.1. (Left) Piece-wise afﬁne warping of the image by triangulation. (Right) A Fig. (Left) Piece-wise afﬁne warping of the image by triangulation. (Right) A In the next section we provide some background material for In the next section we provide some background material for uniform slicing toto disjoint square patches for coding purposes. uniform slicing disjoint square patches for coding purposes. this work: we start by presenting the details of the compression this work: we start by presenting the details of the compression algorithm developed in [17], as their scheme isis the one we embark algorithm developed in [17], as their scheme the one we embark K-Means) per each patch separately, using patches taken from the K-Means) per each patch separately, using patches taken from the from in the development of ours. We also describe the topic of from in the development of ours. We also describe the topic of same location from 5000 training images. This way, each VQ isis same location from 5000 training images. This way, each VQ sparse and redundant representations and the K-SVD, that are sparse and redundant representations and the K-SVD, that are adapted to the expected local content, and thus the high perfor- adapted to the expected local content, and thus the high perfor- the foundations for our algorithm. In Section 3 3 we turn to present the foundations for our algorithm. In Section we turn to present mance presented by this algorithm. The number of code-words mance presented by this algorithm. The number of code-words the proposed algorithm in details, showing its various steps, and the proposed algorithm in details, showing its various steps, and in the VQ isisa afunction of the bit-allocation for the patches. As in the VQ function of the bit-allocation for the patches. As discussing its computational/memory complexities. Section 4 4 discussing its computational/memory complexities. Section we argue in the next section, VQ coding isis limited by the available we argue in the next section, VQ coding limited by the available presents results of our method, demonstrating the claimed presents results of our method, demonstrating the claimed number of examples and the desired rate, forcing relatively small number of examples and the desired rate, forcing relatively small superiority. We conclude in Section 5 5 with a list of future activities superiority. We conclude in Section with a list of future activities patch sizes. This, in turn, leads to a a loss of some redundancy be- patch sizes. This, in turn, leads to loss of some redundancy be- that can further improve over the proposed scheme. that can further improve over the proposed scheme. tween adjacent patches, and thus loss of potential compression. tween adjacent patches, and thus loss of potential compression. 2. Background material 2. Background material AnotherThe Dictionary obtained by K-SVD for Patch No. 80 (the that partlyOMPcompensates Another ingredient in this algorithmleft eye) using the compensates Fig. 6. ingredient in this algorithm that partly method with L ¼ 4. for the above-described shortcoming isis a a multi-scale coding for the above-described shortcoming multi-scale coding Dk scheme. The image isisscaled down and VQ-coded using patches scheme. The image scaled down and VQ-coded using patches 2.1. VQ-based image compression 2.1. VQ-based image compression of size 8 8 Â 8. Then it is interpolated back to the original resolution, of size Â 8. Then it is interpolated back to the original resolution, and the residual isiscoded using VQ on 8 8 Â 8pixel patches once and the residual coded using VQ on Â 8 pixel patches once Among the thousands of papers that study still image Among the thousands of papers that study still image again. This method can be applied on a a Laplacian pyramid of the again. This method can be applied on Laplacian pyramid of the compression algorithms, there are relatively few that consider compression algorithms, there are relatively few that consider original (warped) image with several scales [33]. original (warped) image with several scales [33]. the treatment of facial images [2–17]. Among those, the most the treatment of facial images [2–17]. Among those, the most As already mentioned above, the results shown in [17] surpass As already mentioned above, the results shown in [17] surpass recent and the best performing algorithm isis the one reported in recent and the best performing algorithm the one reported in those obtained by JPEG2000, both visually and in Peak-Signal-to- those obtained by JPEG2000, both visually and in Peak-Signal-to- [17]. That paper also provides a athorough literature survey that [17]. That paper also provides thorough literature survey that Noise Ratio (PSNR) quantitative comparisons. In our work we pro- Noise Ratio (PSNR) quantitative comparisons. In our work we pro- compares the various methods and discusses similarities and compares the various methods and discusses similarities and pose to replace the coding stage from VQ to sparse and redundant pose to replace the coding stage from VQ to sparse and redundant differences between them. Therefore, rather than repeating such differences between them. Therefore, rather than repeating such representations—this leads us to the next subsection, were we de- representations—this leads us to the next subsection, were we de- a asurvey here, we refer the interested reader to [17]. In this survey here, we refer the interested reader to [17]. In this scribe the principles behind this coding strategy. scribe the principles behind this coding strategy. sub-section we concentrate on the description of the algorithm sub-section we concentrate on the description of the algorithm in [17] as our method resembles itit to some extent. in [17] as our method resembles to some extent. 2.2. Sparse and redundant representations 2.2. Sparse and redundant representations

No public clipboards found for this slide

Be the first to comment