Training Linear Discriminant Analysis in Linear Time Deng Cai, Xiaofei He, Jiawei Han Reporter :  Wei-Ching He 2007/8/23
Outline <ul><li>Introduction </li></ul><ul><li>Linear Discriminant Analysis </li></ul><ul><li>Spectral Regression Discrimi...
Introduction <ul><li>Dimensionality reduction  has been a key problem in many field of information procession due to “ cur...
Introduction <ul><li>LDA preserve  class separability . </li></ul><ul><li>LDA involves dense matrices eigen-decomposition ...
Introduction <ul><li>SRDA combines spectral graph analysis and regression. </li></ul><ul><li>It can be easily scaled to ve...
Linear Discriminant Analysis (LDA) <ul><li>The objective function of LDA as Eqn (1). </li></ul><ul><ul><li>Given a set of ...
Linear Discriminant Analysis <ul><li>Define  S t   =Σ  =   ( x i   – μ ) ( x i   – μ ) T   as the total scatter matrix and...
Computational Analysis of LDA <ul><li>Let  x i  =  x i  –  μ  denote to the centered data point and  X (k)  = [  x 1 (k)  ...
Computational Analysis of LDA <ul><li>Then </li></ul><ul><li>Eqn(5) => </li></ul><ul><li>Define </li></ul>
Computational Analysis of LDA <ul><li>After calculating  b ’s, the  a ’s can obtained by  a=UΣ -1 b </li></ul><ul><li>Supp...
3 steps of LDA <ul><li>1.  SVD decomposition of  X  to get  U , V ,and  Σ . </li></ul><ul><li>2.  Computing  b ’s, the eig...
Linear Discriminant Analysis <ul><li>The left or right singular vectors of X (comun vectors of U or V) are eigenvectors of...
Time  complexity of LDA <ul><li>Flam: a compound operation consisting of one addition and one multiplication. </li></ul><u...
Spectral Regression Discriminant Analysis (SRDA) <ul><li>Theorem 1.  </li></ul><ul><li>Let  y  be the eigenvector of  W  s...
SRDA <ul><li>By theorem 1 , LDA can be obtained through two steps: </li></ul><ul><ul><li>1. Solve the eigen-problem in Eqn...
Ridge regression <ul><li>If  n > m , there’re infinite solutions in Eqn (13). The most popular way to solve this problem i...
Spectral analysis <ul><li>W is block-diagonal, thus, its eigenvalues and eigenvectors are union of eigen-values and eigenv...
Spectral analysis <ul><li>In order to guarantee there exists a vector  a   which satisfies the linear equation system  X T...
SRDA <ul><li>in the following discussions,  y  is one of the eigenvector in Eqn.(16). </li></ul><ul><li>Eqn.(14) can be re...
Theoretical Analysis <ul><li>Thmeorem 2. </li></ul><ul><li>If  y  is in the space spanned by row vectors of  X , the corre...
Theoretical Analysis <ul><li>The  i -th  and  j -th entries of any vector  y   in the space spanned by { y k  } in Eqn.(15...
Theoretical Analysis <ul><li>Let  A  = [ a 1 ,…, a c-1 ] be the LDA transformation matrix which embeds the data points int...
Computational Complexity Analysis <ul><li>Two steps </li></ul><ul><ul><li>Responses generation (by Gram-Schmidt method)  <...
Computational Complexity Analysis
Experiment <ul><li>Four datasets are used in our experimental study, including face, handwritten digit, spoken letter, and...
Experiment  (compared algorithms) <ul><li>1.  Linear Discriminant Analysis  (LDA). Solving the singularity problem by usin...
Experiment
Experiment
Parameter selection for SRDA
Conclusion <ul><li>SRDA provides an efficient and effective approach for discriminant analysis.  </li></ul><ul><li>SRDA is...
Upcoming SlideShare
Loading in …5
×

20070823

722 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
722
On SlideShare
0
From Embeds
0
Number of Embeds
36
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • 20070823

    1. 1. Training Linear Discriminant Analysis in Linear Time Deng Cai, Xiaofei He, Jiawei Han Reporter : Wei-Ching He 2007/8/23
    2. 2. Outline <ul><li>Introduction </li></ul><ul><li>Linear Discriminant Analysis </li></ul><ul><li>Spectral Regression Discriminant Analysis </li></ul><ul><li>Experiment </li></ul><ul><li>Conclusion </li></ul>
    3. 3. Introduction <ul><li>Dimensionality reduction has been a key problem in many field of information procession due to “ curse of dimensionality ”. </li></ul><ul><li>One of most popular dimensionality reduction algorithm is Linear Discriminant Analysis (LDA) </li></ul>
    4. 4. Introduction <ul><li>LDA preserve class separability . </li></ul><ul><li>LDA involves dense matrices eigen-decomposition which can be expensive both in time and memory. </li></ul><ul><li>It is infeasible to apply LDA on large scale high dimensional data. </li></ul><ul><li>Spectral Regression Discriminant Analysis (SRDA) is developed from LDA but has significant computational advantage. </li></ul>
    5. 5. Introduction <ul><li>SRDA combines spectral graph analysis and regression. </li></ul><ul><li>It can be easily scaled to very large high dimensional data sets. </li></ul>
    6. 6. Linear Discriminant Analysis (LDA) <ul><li>The objective function of LDA as Eqn (1). </li></ul><ul><ul><li>Given a set of m smaples x 1 , x 2 ,…,x m belong to c classes. </li></ul></ul><ul><ul><li>between-class scatter matrix </li></ul></ul><ul><ul><li>within-class scatter matrix </li></ul></ul><ul><ul><li>Where μ is total mean vector, m k is number of sample in the k -th class, μ (k) is the average vector of k -th class, and x i (k) is the i - th sample in the k -th class. </li></ul></ul>
    7. 7. Linear Discriminant Analysis <ul><li>Define S t =Σ = ( x i – μ ) ( x i – μ ) T as the total scatter matrix and we have S t = S b + S w . </li></ul><ul><li>So, Eqn. (1) is equivalent to </li></ul><ul><li>Those optimal a ’s are the eigenvectors corresponding to the non-zero eigenvalue of the generalized eigenvalues </li></ul><ul><li>Since the rank ( S b ) ≦ c -1, there’re most c -1 eigen-vectors corresponding to non-zero eigenvalues. </li></ul>
    8. 8. Computational Analysis of LDA <ul><li>Let x i = x i – μ denote to the centered data point and X (k) = [ x 1 (k) ,…, x m k (k) ] denote to the centered data matrix of k -th class. </li></ul><ul><li>W (k) is a m k x m k matrix with all elements equal to 1/ m k . </li></ul>
    9. 9. Computational Analysis of LDA <ul><li>Then </li></ul><ul><li>Eqn(5) => </li></ul><ul><li>Define </li></ul>
    10. 10. Computational Analysis of LDA <ul><li>After calculating b ’s, the a ’s can obtained by a=UΣ -1 b </li></ul><ul><li>Suppose rank( )=r, </li></ul><ul><li>where U T U = V T V = I </li></ul><ul><li>, Σ =diag(σ 1 ,σ 2 , …,σ r ) σ 1 ≧σ 2 ≧ …≧σ r ≧0 . </li></ul>
    11. 11. 3 steps of LDA <ul><li>1. SVD decomposition of X to get U , V ,and Σ . </li></ul><ul><li>2. Computing b ’s, the eigenvectors of V T WV . </li></ul><ul><li>3. Computing a = UΣ -1 b . </li></ul>
    12. 12. Linear Discriminant Analysis <ul><li>The left or right singular vectors of X (comun vectors of U or V) are eigenvectors of XX T or X T X . </li></ul><ul><li>Given U or V, we can recover the other via XV = UΣ or U T X =ΣV T . </li></ul><ul><li>Eg. </li></ul><ul><li>In most case, r is close to min(m,n) . So r >>c Computing the eigenvectors of H T H then recover the eigenvectors of HH T is faster than computing the eigenvectors of HH T . </li></ul>
    13. 13. Time complexity of LDA <ul><li>Flam: a compound operation consisting of one addition and one multiplication. </li></ul><ul><li>When m>n </li></ul><ul><ul><li>Calculation of XX T : mn 2 /2 </li></ul></ul><ul><ul><li>Eigenvectors of XX T :9m 3 /2 </li></ul></ul><ul><ul><li>Recover V from U :mn 2 assume r is close to min(m,n) </li></ul></ul><ul><ul><li>Computing c eigenvectors of HH T : nc 2 /2+9c 3 /2+nc 2 flams </li></ul></ul><ul><li>When n<m, the similar analysis . </li></ul><ul><li>Time complexity: </li></ul><ul><li>3mnt/2+9t 3 /2+3tc 2 /2+9c 3 /2+t 2 c , t = min(m,n) </li></ul>
    14. 14. Spectral Regression Discriminant Analysis (SRDA) <ul><li>Theorem 1. </li></ul><ul><li>Let y be the eigenvector of W such that Wy = λy . </li></ul><ul><li>If X T a = y , then a is eigenvector of eigen-problem in Eqn(8). </li></ul><ul><li>Pf. </li></ul><ul><li>XWX T a = XWy = X λy =λXX T a . </li></ul>
    15. 15. SRDA <ul><li>By theorem 1 , LDA can be obtained through two steps: </li></ul><ul><ul><li>1. Solve the eigen-problem in Eqn(12)to get y . </li></ul></ul><ul><ul><li>2. Find a which satisfies X T a = y . In reality, such a may not exist. A possible way is to find a which can best fit the equation in the least squares sense: </li></ul></ul><ul><ul><li>where y i is the i -th element of y </li></ul></ul>
    16. 16. Ridge regression <ul><li>If n > m , there’re infinite solutions in Eqn (13). The most popular way to solve this problem is to impose a penalty on the norm of a . </li></ul><ul><li>Where α ≧0 is a parameter to control the amount of shrinkage. </li></ul>
    17. 17. Spectral analysis <ul><li>W is block-diagonal, thus, its eigenvalues and eigenvectors are union of eigen-values and eigenvectors of its blocks. </li></ul><ul><li>W (k) has only one nonzero eigenvector e (k) </li></ul><ul><li>Thus, there’re exactly c eigenvector of W with eigen value 1. </li></ul><ul><li>These eigenvectors of W are </li></ul>
    18. 18. Spectral analysis <ul><li>In order to guarantee there exists a vector a which satisfies the linear equation system X T a = y , y should be in the space spanned by the row vectors of X . Since Xe = 0, e =[1,…,1] T is orthogonal to this space. </li></ul><ul><li>e is in the space of { y k }. </li></ul><ul><li>We pick e as the first eigenvector of and use Gram-Schmidt process to orthogonzlize the remaining eigenvectors. </li></ul><ul><li>Remove e , which leave us exactly c -1 eigenvectors of W as below. </li></ul>
    19. 19. SRDA <ul><li>in the following discussions, y is one of the eigenvector in Eqn.(16). </li></ul><ul><li>Eqn.(14) can be rewritten in matrix form as: </li></ul><ul><li>Respect to a vanish, we get ?? </li></ul>
    20. 20. Theoretical Analysis <ul><li>Thmeorem 2. </li></ul><ul><li>If y is in the space spanned by row vectors of X , the corresponding projective function a calculated in SRDA will be the eigenvector of eigen-problem in Eqn.(8) as α deceases to zero. Therefor, a will be one of the projective function of LDA. </li></ul><ul><li>Corollary 3 </li></ul><ul><li>If the sample vectors are linearly independent, then all c -1 projective functions in SRDA will be identical to those of LDA as α deceases to zero . </li></ul>
    21. 21. Theoretical Analysis <ul><li>The i -th and j -th entries of any vector y in the space spanned by { y k } in Eqn.(15) are the same as long as x i and x j belong to the same class. Thus the i -th and j -th rows of Y are the same, where Y = [ y 1 ,…, y c-1 ] . </li></ul><ul><li>Corollary (3) shows that when sample is linearly independent, the c -1 projective functions of LDA are exactly the solutions of the c -1 linear equations systems X T a k = y K . </li></ul>
    22. 22. Theoretical Analysis <ul><li>Let A = [ a 1 ,…, a c-1 ] be the LDA transformation matrix which embeds the data points into the LDA subspace as: </li></ul><ul><li>A T X = A T (X + μe T )=Y T + A T μe T </li></ul><ul><li>The projective functions are usually overfit the traning set thus may not be able to perform well for the test samples, thus the regularization is necessary. </li></ul>
    23. 23. Computational Complexity Analysis <ul><li>Two steps </li></ul><ul><ul><li>Responses generation (by Gram-Schmidt method) </li></ul></ul><ul><ul><li>mc 2 - c 3 /3 flam and mc+c2 memory </li></ul></ul><ul><ul><li>Regularized least squares (2 method) </li></ul></ul><ul><ul><ul><li>Solving normal equations </li></ul></ul></ul><ul><ul><ul><li>Iterative solution with LSQR </li></ul></ul></ul>
    24. 24. Computational Complexity Analysis
    25. 25. Experiment <ul><li>Four datasets are used in our experimental study, including face, handwritten digit, spoken letter, and text databases. </li></ul>
    26. 26. Experiment (compared algorithms) <ul><li>1. Linear Discriminant Analysis (LDA). Solving the singularity problem by using SVD . </li></ul><ul><li>2. Regularized LDA (RLDA) . Solving the singularity problem by adding some constant values to the diagonal elements of S w , as S w + αI, for some α > 0 and I is an identity matrix. </li></ul><ul><li>3. Spectral Regression Discriminant Analysis (SRDA), our approach proposed in this paper. </li></ul><ul><li>4. IDR/QR , a LDA variation in which QR decom- position is applied rather than SVD. Thus, IDR/QR is very efficient. </li></ul>
    27. 27. Experiment
    28. 28. Experiment
    29. 29. Parameter selection for SRDA
    30. 30. Conclusion <ul><li>SRDA provides an efficient and effective approach for discriminant analysis. </li></ul><ul><li>SRDA is first one which can handlevery large scale high dimensional data for discriminant analysis. </li></ul>

    ×