# A Friendly Guide To Sparse Coding

### A Friendly Guide To Sparse Coding

1. 1. Sparse Coding Shao-Chuan Wang Review of PCA A Friendly Guide To Sparse Coding Introducing Sparsity Solving the Optimization Problem Shao-Chuan Wang Learning Dictionary Research Center for Information Technology Innovation Applications Academia Sinica E-mail: scwang ASCII(64) ntu.edu.tw December 3, 2009 Sparse Coding : Shao-Chuan Wang (Academia Sinica) 1 / 18
2. 2. Outline Sparse Coding Shao-Chuan Wang 1 Review of PCA Review of PCA Introducing Sparsity 2 Introducing Sparsity Solving the Optimization Problem 3 Solving the Optimization Problem Learning Dictionary Applications 4 Learning Dictionary 5 Applications Sparse Coding : Shao-Chuan Wang (Academia Sinica) 2 / 18
3. 3. PCA Review Sparse Coding Shao-Chuan Wang Review of PCA x∈ m, D = [d1 , d2 , d3 , ...dp ] ∈ where dj ∈ m×p , If x m. Introducing Sparsity can be approximated by the linear combination of D, i.e., Solving the Optimization Problem x ∼ x = Dα, ˆ (1) Learning Dictionary where α ∈ p and α is new coordinate in terms of the new Applications basis D. Sparse Coding : Shao-Chuan Wang (Academia Sinica) 3 / 18
4. 4. PCA Review Sparse Coding Shao-Chuan Wang Review of PCA Introducing Sparsity We want x is as close as possible to x, i.e., minimize ˆ Solving the reconstruction error; If we deﬁne the error metric, L2 norm Optimization Problem for instance, Error = x − Dα 2 2 (2) Learning Dictionary Applications How to get D? Sparse Coding : Shao-Chuan Wang (Academia Sinica) 4 / 18
5. 5. PCA Review Sparse Coding Shao-Chuan Wang Review of PCA Introducing If our goal is to minimize total error, then given a dataset Sparsity S = {x (i) , y (i) }N ... i=0 Solving the Optimization Problem min x (i) − Dα(i) 2 2 (3) Learning Dictionary D,α i Applications Sparse Coding : Shao-Chuan Wang (Academia Sinica) 5 / 18
6. 6. PCA Review Sparse Coding Shao-Chuan Wang Review of PCA Without loss of generality, let’s assume diT dj = δij (For any Introducing Sparsity vectors spaces, the basis can be orthonormalized by Solving the Gram-Schmidt process), from Eq. (1) we know that D T Optimization Problem satisﬁes D T x = D T x = α. ˆ Learning Dictionary min x (i) − DD T x (i) 2 2 (4) Applications D i Sparse Coding : Shao-Chuan Wang (Academia Sinica) 6 / 18
7. 7. PCA Review Sparse Coding Shao-Chuan Wang Review of PCA Using Pythagorean theorem, (4) becomes, Introducing Sparsity (i) T (i) 2 min x − DD x 2 Solving the D Optimization i Problem = min ( x (i) 2 2 − DD T x (i) 2 2) Learning D Dictionary i i Applications ˆ ⇒ D = arg max DD T x (i) 2 2 D i Sparse Coding : Shao-Chuan Wang (Academia Sinica) 7 / 18
8. 8. PCA Review Sparse Coding Shao-Chuan Wang Review of PCA This optimization problem can be rewritten as Introducing Sparsity ˆ D = arg max DD x T (i) 2 2 Solving the D Optimization i Problem = arg max djT ( x (i) (x (i) )T )dk , Learning Dictionary D j,k i Applications and solve the eigenvalue problems of covariance matrix (i) (i) T i x (x ) . Sparse Coding : Shao-Chuan Wang (Academia Sinica) 8 / 18
9. 9. Introducing Sparsity Sparse Coding Shao-Chuan Wang How about regularization? Review of PCA Introducing Sparsity min x (i) − Dα(i) 2 2 +λψ(α), λ ≥ 0, Solving the D,α i Optimization Problem where λψ(α) is called regularization, or sparsity, or prior Learning Dictionary term, and λ is the strength of regularization. Intuitively, Applications ψ(α) is a term to ”conﬁne” the ”quota” of αi and therefore make α ”sparse”. In fact, regularized linear regression also introduces the sparsity on θ coeﬃcients. Sparse Coding : Shao-Chuan Wang (Academia Sinica) 9 / 18
10. 10. Introducing Sparsity Sparse Coding Shao-Chuan Wang Review of PCA Hence, we can conclude that sparse coding is a more Introducing Sparsity generalized form of principle component analysis. (PCA + Solving the Sparsity = Sparse PCA (Zou et al., 2004)). diT dj may = 0. Optimization Problem Also if m = p, then no dimension ”reduction” anymore, and Learning Dictionary only sparsity aﬀect the basis. Or even, we can make p > m, Applications using an over-complete basis and let sparsity dominate D and α. Sparse Coding : Shao-Chuan Wang (Academia Sinica) 10 / 18
11. 11. Solve the Optimization Problem Sparse Coding Shao-Chuan Wang Review of PCA Introducing How to solve the optimization problem? ⇒ Too Hard!. Sparsity Solving the Hence, we assume D is known ﬁrst (i.e., designed D). Two Optimization greedy algorithms are the most popular: Problem Learning Matching Pursuit Dictionary Applications Orthogonal Matching Pursuit Sparse Coding : Shao-Chuan Wang (Academia Sinica) 11 / 18
12. 12. Matching Pursuit Sparse Coding 2 minp x − Dα 2 s.t. α 0 ≤L (5) Shao-Chuan Wang α∈ r Review of PCA 1: α ← 0. Introducing Sparsity 2: r ← x (residual). Solving the 3: while α 0 < L do Optimization Problem Pick the element who correlates the most with the Learning residual. Dictionary Applications ˆ ← arg maxi=1,...,p i diT r Subtract the contribution and update α α[ˆ ← α[ˆ + dˆ r i] i] i T T r ← r − (dˆ r )dˆ i i end while Sparse Coding : Shao-Chuan Wang (Academia Sinica) 12 / 18
13. 13. Orthogonal Matching Pursuit Sparse Coding 2 minp x − Dα 2 s.t. α 0 ≤L (6) Shao-Chuan Wang α∈ r Review of PCA 1: Γ = ø. Introducing Sparsity 2: while α 0 < L do Solving the Pick the element that most reduces the objective Optimization Problem ˆ ← arg mini∈ΓC {minα x − DΓ i {i} α 2} Learning 2 Dictionary Applications Update the active set: Γ ← Γ {ˆ i}. Update α and the residual αΓ ← (DΓ D Γ )−1 D Γ T x, T r ← x − DαΓ . end while Sparse Coding : Shao-Chuan Wang (Academia Sinica) 13 / 18
14. 14. Learning Dictionary Sparse Coding Shao-Chuan Wang How do we learn D from the data? Review of PCA Introducing min x (i) − Dα(i) 2 2 +λ α 0,1,2 , λ ≥ 0, (7) Sparsity D,α i Solving the Optimization Problem Learning Brute force Dictionary K-means-like Applications FOCUSS (K. Engan et al., 2003) K-SVD (M. Aharon et al., 2005) Online Dictionary Learning (J. Mairal et al., 2009) Sparse Coding : Shao-Chuan Wang (Academia Sinica) 14 / 18
15. 15. K-SVD (M. Aharon et al., 2005) 1: Initialize D ∈ m×k with random normalized dictionary; Sparse Coding 2: Repeat until convergence { Shao-Chuan Wang Sparse Coding Stage: Review of PCA Use pursuit algorithm to compute sparse code α(i) of x (i) Introducing Sparsity Codebook Update Stage: Solving the For j = 1, 2, ..., k do { Optimization Problem Deﬁne the cluster of examples that use dj ω ← {i | 1 ≤ i ≤ M, α(i) [j] = 0}. Learning Dictionary For each i ∈ ω do r (i) ← x (i) − Dα(i) . Applications ˆ ˆ d, β ← arg min r (i) + α(i) [j]dj − d β 2 , 2 d ,β∈ |ω| ı∈ω dj ˆ ˆ ← d, and replace α(i) [j] = 0 with β. } } Sparse Coding : Shao-Chuan Wang (Academia Sinica) 15 / 18
16. 16. Applications Sparse Coding Image De-noise Shao-Chuan Wang (Roth and Black, Review of PCA 2009) Introducing Sparsity Solving the Optimization Problem Learning Dictionary Applications Sparse Coding : Shao-Chuan Wang (Academia Sinica) 16 / 18
17. 17. Applications Sparse Coding Image De-noise Shao-Chuan Wang (Roth and Black, Review of PCA 2009) Introducing Sparsity Edge Detection (J. Solving the Marial et al., 2008) Optimization Problem Learning Dictionary Applications Sparse Coding : Shao-Chuan Wang (Academia Sinica) 16 / 18
18. 18. Applications Sparse Coding Image De-noise Shao-Chuan Wang (Roth and Black, Review of PCA 2009) Introducing Sparsity Edge Detection (J. Solving the Marial et al., 2008) Optimization Problem Image In-painting Learning (Roth and Black, Dictionary 2009) Applications Sparse Coding : Shao-Chuan Wang (Academia Sinica) 16 / 18
19. 19. Applications Sparse Coding Image De-noise Shao-Chuan Wang (Roth and Black, Review of PCA 2009) Introducing Sparsity Edge Detection (J. Solving the Marial et al., 2008) Optimization Problem Image In-painting Learning (Roth and Black, Dictionary 2009) Applications Super-resolution (Yang et al, 2008) Sparse Coding : Shao-Chuan Wang (Academia Sinica) 16 / 18
20. 20. Applications Sparse Coding Image De-noise Shao-Chuan Wang (Roth and Black, Review of PCA 2009) Introducing Sparsity Edge Detection (J. Solving the Marial et al., 2008) Optimization Problem Image In-painting Learning (Roth and Black, Dictionary 2009) Applications Super-resolution (Yang et al, 2008) Signal Compression (in replace of VQ using K-means) Sparse Coding : Shao-Chuan Wang (Academia Sinica) 16 / 18
