3. face recognition using laplacian faces


Published on

Published in: Technology, News & Politics
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

3. face recognition using laplacian faces

  1. 1. 328 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 27, NO. 3, MARCH 2005 Face Recognition Using Laplacianfaces Xiaofei He, Shuicheng Yan, Yuxiao Hu, Partha Niyogi, and Hong-Jiang Zhang, Fellow, IEEE Abstract—We propose an appearance-based face recognition method called the Laplacianface approach. By using Locality Preserving Projections (LPP), the face images are mapped into a face subspace for analysis. Different from Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) which effectively see only the Euclidean structure of face space, LPP finds an embedding that preserves local information, and obtains a face subspace that best detects the essential face manifold structure. The Laplacianfaces are the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the face manifold. In this way, the unwanted variations resulting from changes in lighting, facial expression, and pose may be eliminated or reduced. Theoretical analysis shows that PCA, LDA, and LPP can be obtained from different graph models. We compare the proposed Laplacianface approach with Eigenface and Fisherface methods on three different face data sets. Experimental results suggest that the proposed Laplacianface approach provides a better representation and achieves lower error rates in face recognition. Index Terms—Face recognition, principal component analysis, linear discriminant analysis, locality preserving projections, face manifold, subspace learning. æ1 INTRODUCTIONM ANY face recognition techniques have been developed over the past few decades. One of the most successfuland well-studied techniques to face recognition is the are far from each other while requiring data points of the same class to be close to each other. Unlike PCA which encodes information in an orthogonal linear space, LDAappearance-based method [28], [16]. When using appear- encodes discriminating information in a linearly separableance-based methods, we usually represent an image of size space using bases that are not necessarily orthogonal. It isn  m pixels by a vector in an n  m-dimensional space. In generally believed that algorithms based on LDA are superiorpractice, however, these n  m-dimensional spaces are too to those based on PCA. However, some recent work [14]large to allow robust and fast face recognition. A common shows that, when the training data set is small, PCA canway to attempt to resolve this problem is to use dimension- outperform LDA, and also that PCA is less sensitive toality reduction techniques [1], [2], [8], [11], [12], [14], [22], different training data sets.[26], [28], [34], [37]. Two of the most popular techniques for Recently, a number of research efforts have shown thatthis purpose are Principal Component Analysis (PCA) [28] the face images possibly reside on a nonlinear submanifoldand Linear Discriminant Analysis (LDA) [2]. [7], [10], [18], [19], [21], [23], [27]. However, both PCA and PCA is an eigenvector method designed to model linear LDA effectively see only the Euclidean structure. They failvariation in high-dimensional data. PCA performs dimen- to discover the underlying structure, if the face images liesionality reduction by projecting the original n-dimensionaldata onto the kð<< nÞ-dimensional linear subspace spanned on a nonlinear submanifold hidden in the image space.by the leading eigenvectors of the data’s covariance matrix. Some nonlinear techniques have been proposed to discoverIts goal is to find a set of mutually orthogonal basis the nonlinear structure of the manifold, e.g., Isomap [27],functions that capture the directions of maximum variance LLE [18], [20], and Laplacian Eigenmap [3]. These nonlinearin the data and for which the coefficients are pairwise methods do yield impressive results on some benchmarkdecorrelated. For linearly embedded manifolds, PCA is artificial data sets. However, they yield maps that areguaranteed to discover the dimensionality of the manifold defined only on the training data points and how to evaluateand produces a compact representation. Turk and Pentland the maps on novel test data points remains unclear.[28] use Principal Component Analysis to describe face Therefore, these nonlinear manifold learning techniquesimages in terms of a set of basis functions, or “eigenfaces.” [3], [5], [18], [20], [27], [35] might not be suitable for some LDA is a supervised learning algorithm. LDA searches for computer vision tasks, such as face recognition.the project axes on which the data points of different classes In the meantime, there has been some interest in the problem of developing low-dimensional representations through kernel based techniques for face recognition [13],. X. He and P. Niyogi are with the Department of Computer Science, The [33]. These methods can discover the nonlinear structure of University of Chicago, 1100 E. 58th Street, Chicago, IL 60637. the face images. However, they are computationally expen- E-mail: {xiaofei, niyogi}@cs.uchicago.edu.. S. Yan is with the Department of Applied Mathematics, School of sive. Moreover, none of them explicitly considers the Mathematical Sciences, Beijing University, 100871, China. structure of the manifold on which the face images possibly E-mail: scyan@math.pku.edu.cn. reside.. Y. Hu and H.-J. Zhang are with Microsoft Research Asia, 49 Zhichun In this paper, we propose a new approach to face analysis Road, Beijing 100080, China. (representation and recognition), which explicitly considers E-mail: hu3@ifp.uiuc.edu, hjzhang@microsoft.com. the manifold structure. To be specific, the manifold structureManuscript received 30 Apr. 2004; revised 23 Aug. 2004; accepted 3 Sept. is modeled by a nearest-neighbor graph which preserves the2004; published online 14 Jan. 2005.Recommended for acceptance by J. Buhmann. local structure of the image space. A face subspace is obtainedFor information on obtaining reprints of this article, please send e-mail to: by Locality Preserving Projections (LPP) [9]. Each face image intpami@computer.org, and reference IEEECS Log Number TPAMI-0204-0404. the image space is mapped to a low-dimensional face 0162-8828/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society
  2. 2. HE ET AL.: FACE RECOGNITION USING LAPLACIANFACES 329subspace, which is characterized by a set of feature images, particular, attractive because they are simple to compute andcalled Laplacianfaces. The face subspace preserves local analytically tractable. In effect, linear methods project thestructure and seems to have more discriminating power than high-dimensional data onto a lower dimensional subspace.the PCA approach for classification purpose. We also provide Considering the problem of representing all of the vectorstheoretical analysis to show that PCA, LDA, and LPP can be in a set of n d-dimensional samples x1 ; x2 ; . . . ; xn , with zeroobtained from different graph models. Central to this is a mean, by a single vector y ¼ fy1 ; y2 ; . . . ; yn g such that yigraph structure that is inferred on the data points. LPP finds a represents xi . Specifically, we find a linear mapping from theprojection that respects this graph structure. In our theore-tical analysis, we show how PCA, LDA, and LPP arise from d-dimensional space to a line. Without loss of generality, wethe same principle applied to different choices of this graph denote the transformation vector by w. That is, wT xi ¼ yi .structure. Actually, the magnitude of w is of no real significance It is worthwhile to highlight several aspects of the because it merely scales yi . In face recognition, each vector xiproposed approach here: denotes a face image. Different objective functions will yield different algo- 1. While the Eigenfaces method aims to preserve the rithms with different properties. PCA aims to extract a global structure of the image space, and the Fisherfaces subspace in which the variance is maximized. Its objective method aims to preserve the discriminating informa- function is as follows: tion; our Laplacianfaces method aims to preserve the local structure of the image space. In many real-world X n classification problems, the local manifold structure is max ðyi À yÞ2 ; ð1Þ w more important than the global Euclidean structure, i¼1 especially when nearest-neighbor like classifiers are used for classification. LPP seems to have discriminat- 1X n ing power although it is unsupervised. y¼ yi : ð2Þ n i¼1 2. An efficient subspace learning algorithm for face recognition should be able to discover the nonlinear The output set of principal vectors w1 ; w2 ; . . . ; wk is an manifold structure of the face space. Our proposed orthonormal set of vectors representing the eigenvectors of Laplacianfaces method explicitly considers the the sample covariance matrix associated with the k < d manifold structure which is modeled by an adja- largest eigenvalues. cency graph. Moreover, the Laplacianfaces are ob- While PCA seeks directions that are efficient for repre- tained by finding the optimal linear approximations sentation, Linear Discriminant Analysis seeks directions that to the eigenfunctions of the Laplace Beltrami are efficient for discrimination. Suppose we have a set of n d-dimensional samples x1 ; x2 ; . . . ; xn , belonging to l classes operator on the face manifold. They reflect the of faces. The objective function is as follows: intrinsic face manifold structures. 3. LPP shares some similar properties to LLE [18], such w T SB w as a locality preserving character. However, their max ; ð3Þ w wT SW w objective functions are totally different. LPP is obtained by finding the optimal linear approxima- tions to the eigenfunctions of the Laplace Beltrami X l SB ¼ ni ðmðiÞ À mÞðmðiÞ À mÞT ; ð4Þ operator on the manifold. LPP is linear, while LLE is i¼1 nonlinear. Moreover, LPP is defined everywhere, while LLE is defined only on the training data points ! and it is unclear how to evaluate the maps for new X X l ni ðiÞ ðiÞ ðiÞ T SW ¼ ðxj À mðiÞ Þðxj Àm Þ ; ð5Þ test points. In contrast, LPP may be simply applied to i¼1 j¼1 any new data point to locate it in the reduced representation space. where m is the total sample mean vector, ni is the number The rest of the paper is organized as follows: Section 2 of samples in the ith class, mðiÞ is the average vector of the ðiÞdescribes PCA and LDA. The Locality Preserving Projections ith class, and xj is the jth sample in the ith class. We call(LPP) algorithm is described in Section 3. In Section 4, we SW the within-class scatter matrix and SB the between-classprovide a statistical view of LPP. We then give a theoretical scatter matrix.analysis of LPP and its connections to PCA and LDA inSection 5. Section 6 presents the manifold ways of face 3 LEARNING A LOCALITY PRESERVING SUBSPACEanalysis using Laplacianfaces. A variety of experimentalresults are presented in Section 7. Finally, we provide some PCA and LDA aim to preserve the global structure.concluding remarks and suggestions for future work in However, in many real-world applications, the localSection 8. structure is more important. In this section, we describe Locality Preserving Projection (LPP) [9], a new algorithm for learning a locality preserving subspace. The complete2 PCA AND LDA derivation and theoretical justifications of LPP can beOne approach to coping with the problem of excessive traced back to [9]. LPP seeks to preserve the intrinsicdimensionality of the image space is to reduce the dimen- geometry of the data and local structure. The objectivesionality by combining features. Linear combinations are function of LPP is as follows:
  3. 3. 330 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 27, NO. 3, MARCH 2005 X min ðyi À yj Þ2 Sij ; ð6Þ manifolds. While the Laplace Beltrami operator for a ij manifold is generated by the Riemannian metric, for awhere yi is the one-dimensional representation of xi and the graph it comes from the adjacency relation. Belkin andmatrix S is a similarity matrix. A possible way of defining S Niyogi [3] showed that the optimal map preservingis as follows: locality can be found by solving the following optimiza- & 2 tion problem on the manifold: 2 Sij ¼ expðÀ xi À xj =tÞ; jjxi À xj jj ð7Þ Z 0 otherwise min krf k2 ð13Þ kf kL2 ðMÞ ¼1 Mor 8 2 which is equivalent to expðÀkxi Àxj k =tÞ; if xi is among k nearest neighbors of xj Z Sij ¼ or xj is among k nearest neighbors of xi ð8Þ : min LðfÞf; ð14Þ 0 otherwise; kf kL2 ðMÞ ¼1 Mwhere is sufficiently small, and 0. Here, defines the where L is the Laplace Beltrami operator on the manifold,radius of the local neighborhood. In other words, defines i.e., LðfÞ ¼ divrðfÞ. Thus, the optimal f has to be anthe “locality.” The objective function with our choice of eigenfunction of L. If we assume f to be linear, we havesymmetric weights Sij ðSij ¼ Sji Þ incurs a heavy penalty if fðxÞ ¼ wT x. By spectral graph theory, the integral can beneighboring points xi and xj are mapped far apart, i.e., if discretely approximated by wT XLX T w and the L2 norm ofðyi À yj Þ2 is large. Therefore, minimizing it is an attempt to f can be discretely approximated by wT XDXT w, whichensure that, if xi and xj are “close,” then yi and yj are close will ultimately lead to the following eigenvalue problem:as well. Following some simple algebraic steps, we see that 1X XLXT w ¼ XDXT w: ð15Þ ðyi À yj Þ2 Sij 2 ij The derivation reflects the intrinsic geometric structure of 1X the manifold. For details, see [3], [6], [9]. ¼ ðwT xi À wT xj Þ2 Sij 2 X ij X ¼ wT xi Sij xT w À i wT xi Sij xT w j X ij ij ð9Þ 4 STATISTICAL VIEW OF LPP T T T T ¼ i w xi Dii xi w À w XSX w LPP can also be obtained from statistical viewpoint. Suppose ¼ wT XDX T w À wT XSXT w the data points follow some underlying distribution. Let x and y be two random variables. We define that a linear mapping ¼ wT XðD À SÞX T w x ! wT x best preserves the local structure of the underlying ¼ wT XLXT w; distribution in the L2 sense if it minimizes the L2 distances between wT x and wT y provided that jjx À yjj . Namely,where X ¼ ½x1 ; x2 ; . . . ; xn Š, and D is a diagonal matrix; itsentries are column (or row since S is symmetric) sums of P min EðjwT x À wT yj2 jjx À yjj Þ; ð16ÞS; Dii ¼ j Sji . L ¼ D À S is the Laplacian matrix [6]. wMatrix D provides a natural measure on the data points. where is sufficiently small and 0. Here, defines theThe bigger the value Dii (corresponding to yi ) is, the more “locality.” Define z ¼ x À y, then we have the following“important” is yi . Therefore, we impose a constraint as objective function:follows: yT Dy ¼ 1 min EðjwT zj2 jjzjj Þ: ð17Þ ð10Þ ) wT XDXT w ¼ 1: It follows that,Finally, the minimization problem reduces to finding: EðjwT zj2 jjzjj Þ arg min wT XLXT w w ð11Þ ¼ EðwT zzT wjjzjj Þ ð18Þ wT XDX T w ¼ 1: ¼ wT EðzzT jjzjj Þw:The transformation vector w that minimizes the objectivefunction is given by the minimum eigenvalue solution to Given a set of sample points x1 ; x2 ; . . . ; xn , we first define anthe generalized eigenvalue problem: indicator function Sij as follows: XLX T w ¼ XDXT w: ð12Þ 1; jjxi À xj jj2 Sij ¼ ð19ÞNote that the two matrices XLX and XDX are both T T 0 otherwise:symmetric and positive semidefinite since the Laplacian Let d be the number of nonzero Sij , and D be a diagonalmatrix L and the diagonal matrix D are both symmetric and matrix whose entries are column (or row since S is Ppositive semidefinite. symmetric) sums of S, Dii ¼ j Sji . By the Strong Law of The Laplacian matrix for finite graph is analogous to T Large Numbers, Eðzz jjjzjj Þ can be estimated from thethe Laplace Beltrami operator on compact Riemannian sample points as follows:
  4. 4. HE ET AL.: FACE RECOGNITION USING LAPLACIANFACES 331 EðzzT jjzjj Þ The above analysis shows that the weight matrix S plays a key role in the LPP algorithm. When we aim at preserving the 1 X T% zz global structure, we take (or k) to be infinity and choose the d jjzjj eigenvectors (of the matrix XLXT ) associated with the largest 1 X eigenvalues. Hence, the data points are projected along the ¼ ðxi À xj Þðxi À xj ÞT directions of maximal variance. When we aim at preserving d jjx Àx jj i j the local structure, we take to be sufficiently small and 1X choose the eigenvectors (of the matrix XLXT ) associated with ¼ ðxi À xj Þðxi À xj ÞT Sij d i;j the smallest eigenvalues. Hence, the data points are projected ! along the directions preserving locality. It is important to note 1 X X X X that, when (or k) is sufficiently small, the Laplacian matrix is ¼ xi xT Sij þ i xj xT Sij À j xi xT Sij À j xj xT Sij i d i;j i;j i;j i;j no longer the data covariance matrix and, hence, the ! directions preserving locality are not the directions of 2 X X minimal variance. In fact, the directions preserving locality ¼ xi xT Dii À i xi xT Sij j d i i;j are those minimizing local variance. 2À Á ¼ XDXT À XSX T 5.2 Connections to LDA d LDA seeks directions that are efficient for discrimination. 2 ¼ XLX T ; The projection is found by solving the generalized d eigenvalue problem ð20Þ SB w ¼ SW w; ð22Þwhere L ¼ D À S is the Laplacian matrix. The ith column ofmatrix X is xi . By imposing the same constraint, we finally where SB and SW are defined in Section 2. Suppose there areget the same minimization problem described in Section 3. l classes. The ith class contains ni sample points. Let mðiÞ denote the average vector of the ith class. Let xðiÞ denote the ðiÞ random vector associated to the ith class and xj denote the5 THEORETICAL ANALYSIS OF LPP, PCA, AND jth sample point in the ith class. We can rewrite the matrix SW LDA as follows:In this section, we present a theoretical analysis of LPP and ! X X ðiÞ l ni Tits connections to PCA and LDA. SW ¼ xj À m ðiÞ ðiÞ xj À m ðiÞ i¼1 j¼15.1 Connections to PCAWe first discuss the connections between LPP and PCA. X X l ni ðiÞ ðiÞ ðiÞ ¼ xj ðxj ÞT À mðiÞ ðxj ÞT It is worthwhile to point out that XLXT is the data i¼1 j¼1 1 1covariance matrix, if the Laplacian matrix L is nI À n2 eeT , ! where n is the number of data points, I is the identity À ðiÞ xj ðmðiÞ ÞT þ m ðm ÞðiÞ ðiÞ Tmatrix, and e is a column vector taking one at each entry.In fact, the Laplacian matrix here has the effect of ! X X l ni ðiÞ ðiÞremoving the sample mean from the sample vectors. In ¼ xj ðxj ÞT ðiÞ À ni m ðm Þ ðiÞ Tthis case, the weight matrix S takes 1=n2 at each entry, i.e., P i¼1 j¼1Sij ¼ 1=n2 ; 8i; j. Dii ¼ j Sji ¼ 1=n. Hence, the Laplacian X l 1 ðiÞ ðiÞ T ¼1 1matrix is L ¼ D À S P nI À n2 eeT . Let m denote the sample ¼ À T Xi Xi x1 þ Á Á Á þ xðiÞ x1 þ Á Á Á þ xðiÞ ni ni i¼1 nimean, i.e., m ¼ 1=n i xi . We have: Xl 1 À Á T 1 Xi Xi À Xi ei eT Xi T 1 ¼ i XLXT ¼ X I À eeT X T i¼1 ni n n 1 1 X l ¼ XX À 2 ðXeÞðXeÞT T ¼ T Xi Li Xi ; n n i¼1 1X 1 ¼ xi xT À 2 ðnmÞðnmÞT i ð23Þ n i n T 1X where Xi Li Xi is the data covariance matrix of the ¼ ðxi À mÞðxi À mÞT ðiÞ ðiÞ ith class and Xi ¼ ½x1 ; x2 ; Á Á Á ; xðiÞ Š is a d  ni matrix. ni n i T Li ¼ I À 1=ni ei ei is a ni  ni matrix where I is the 1X 1X 1X identity matrix and ei ¼ ð1; 1; . . . ; 1ÞT is an ni -dimensional þ xi mT þ mxT À i mmT À mmT n i n i n i vector. To further simplify the above equation, we define:  à ¼ E ðx À mÞðx À mÞT þ 2mmT À 2mmT X ¼ ðx1 ; x2 ; Á Á Á ; xn Þ; ð24Þ Â Ã ¼ E ðx À mÞðx À mÞT ; ð21Þ 1=nk if xi and xj both belong to the kth class Wij ¼ 0 otherwise;where E½ðx À mÞðx À mÞT Š is just the covariance matrix of ð25Þthe data set.
  5. 5. 332 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 27, NO. 3, MARCH 2005 L ¼ I À W: ð26Þ discriminating information and global geometrical struc-Thus, we get: ture. Moreover, LDA can be induced in the LPP framework. However, LDA is supervised while LPP can be performed SW ¼ XLXT : ð26Þ in either supervised or unsupervised manner.It is interesting to note that we could regard the matrix W as Proposition 1. The rank of L is n À c.the weight matrix of a graph with data points as its nodes. Proof. Without loss of generality, we assume that the dataSpecifically, Wij is the weight of the edge ðxi ; xj Þ. W reflectsthe class relationships of the data points. The matrix L is points are ordered according to which class they are in, sothus called graph Laplacian, which plays key role in LPP [9]. that fx1 ; Á Á Á ; xn1 g are in the first class, fxn1 þ1 ; Á Á Á ; xn1 þn2 g Similarly, we can compute the matrix SB as follows: are in the second class, etc. Thus, we can write L as follows: 2 3 X l T L1 SB ¼ ni mðiÞ À m mðiÞ À m 6 6 L2 7 7 i¼1 L¼6 .. 7; ð31Þ ! ! 4 . 5 X l X l T ¼ ðiÞ ni m ðm Þ ðiÞ T Àm ni m ðiÞ Lc i¼1 i¼1 ! ! where Li is a square matrix, X l X l À ni m ðiÞ m þ T ni mm T 2 1 1 1 3 1 À ni Àni ÁÁÁ Àni i¼1 i¼1 ! 6 .. .. . 7 X 1 ðiÞ l T 6 À1 . . . 7 . 7 ðiÞ 6 ni ¼ x1 þ Á Á Á þ xðiÞ x1 þ Á Á Á þ xðiÞ Li ¼ 6 7: ð32Þ n ni ni 6 . .. .. 1 7 i¼1 i 4 . . . . Àni 5 À 2nmmT þ nmmT Àni1 ÁÁÁ Àni1 1 1 À ni ! X X 1 ðiÞ ðiÞ T l ni ¼ x xk À 2nmmT þ nmmT By adding all but the first column vectors to the first ni j ð28Þ column vector and then subtracting the first row vector i¼1 j;k¼1 ¼ XW X T À 2nmmT þ nmmT from any other row vectors, we get the following matrix 2 3 ¼ XW X T À nmmT 0 0 ÁÁÁ 0 6 .. .7 1 T 60 1 . .7 . ¼ XW X T À X ee XT 6. . 7 ð33Þ n 4.. .. ... 0 5 1 T 0 ÁÁÁ 0 1 ¼ X W À ee XT n whose rank is ni À 1. Therefore, the rank of Li is ni À 1 1 ¼ X W À I þ I À eeT XT and, hence, the rank of L is n À c. u t n Proposition 1 tells us that the rank of XLXT is at most 1 ¼ ÀXLXT þ X I À eeT XT n À c. However, in many cases, in appearance-based face n recognition, the number of pixels in an image (or, the ¼ ÀXLXT þ C; dimensionality of the image space) is larger than n À c, i.e.,where e ¼ ð1; 1; . . . ; 1ÞT is a n-dimensional vector and C ¼ d n À c. Thus, XLXT is singular. In order to overcome the 1XðI À neeT ÞXT is the data covariance matrix. Thus, the complication of a singular XLXT , Belhumeur et al. [2]generalized eigenvector problem of LDA can be written as proposed the Fisherface approach that the face images arefollows: projected from the original image space to a subspace with dimensionality n À c and then LDA is performed in this SB w ¼ SW w subspace. ) ðC À XLX T Þw ¼ XLXT w ) Cw ¼ ð1 þ ÞXLX T w ð29Þ 6 MANIFOLD WAYS OR VISUAL ANALYSIS 1 ) XLXT w ¼ Cw: In the above sections, we have described three different 1þ linear subspace learning algorithms. The key differenceThus, the projections of LDA can be obtained by solving the between PCA, LDA, and LPP is that, PCA and LDA aim tofollowing generalized eigenvalue problem, discover the global structure of the Euclidean space, while LPP aims to discover local structure of the manifold. In this XLXT w ¼ Cw: ð30Þ section, we discuss the manifold ways of visual analysis.The optimal projections correspond to the eigenvectorsassociated with the smallest eigenvalues. If the sample 6.1 Manifold Learning via Dimensionality Reductionmean of the data set is zero, the covariance matrix is simply In many cases, face images may be visualized as points drawnXXT which is exactly the matrix XDXT in the LPP on a low-dimensional manifold hidden in a high-dimensionalalgorithm with the weight matrix defined in (25). Our ambient space. Specially, we can consider that a sheet ofanalysis shows that LDA actually aims to preserve rubber is crumpled into a (high-dimensional) ball. The
  6. 6. HE ET AL.: FACE RECOGNITION USING LAPLACIANFACES 333objective of a dimensionality-reducing mapping is to unfold where t is a suitable constant. Otherwise, put Sij ¼ 0.the sheet and to make its low-dimensional structure explicit. The weight matrix S of graph G models the faceIf the sheet is not torn in the process, the mapping is topology- manifold structure by preserving local structure. Thepreserving. Moreover, if the rubber is not stretched or justification for this choice of weights can be tracedcompressed, the mapping preserves the metric structure of back to [3].the original space. In this paper, our objective is to discover 4. Eigenmap. Compute the eigenvectors and eigenva-the face manifold by a locally topology-preserving mapping lues for the generalized eigenvector problem:for face analysis (representation and recognition). XLX T w ¼ XDX T w; ð35Þ6.2 Learning Laplacianfaces for Representation where D is a diagonal matrix whose entries areLPP is a general method for manifold learning. It is column (or row, since S is symmetric) sums of S,obtained by finding the optimal linear approximations to P Dii ¼ j Sji . L ¼ D À S is the Laplacian matrix. Thethe eigenfunctions of the Laplace Betrami operator on the ith row of matrix X is xi .manifold [9]. Therefore, though it is still a linear technique, Let w0 ; w1 ; . . . ; wkÀ1 be the solutions of (35), orderedit seems to recover important aspects of the intrinsic according to their eigenvalues, 0 0 1 Á Á Á kÀ1 .nonlinear manifold structure by preserving local structure. These eigenvalues are equal to or greater than zero becauseBased on LPP, we describe our Laplacianfaces method for the matrices XLXT and XDXT are both symmetric andface representation in a locality preserving subspace. positive semidefinite. Thus, the embedding is as follows: In the face analysis and recognition problem, one isconfronted with the difficulty that the matrix XDXT is x ! y ¼ W T x; ð36Þsometimes singular. This stems from the fact that sometimesthe number of images in the training set ðnÞ is much smaller W ¼ WP CA WLP P ; ð37Þthan the number of pixels in each image ðmÞ. In such a case, therank of XDX T is at most n, while XDXT is an m  m matrix, WLP P ¼ ½w0 ; w1 ; Á Á Á ; wkÀ1 Š; ð38Þwhich implies that XDX T is singular. To overcome thecomplication of a singular XDX T , we first project the image where y is a k-dimensional vector. W is the transformationset to a PCA subspace so that the resulting matrix XDX T is matrix. This linear mapping best preserves the manifold’snonsingular. Another consideration of using PCA as pre- estimated intrinsic geometry in a linear sense. The columnprocessing is for noise reduction. This method, we call vectors of W are the so-called Laplacianfaces.Laplacianfaces, can learn an optimal subspace for facerepresentation and recognition. The algorithmic procedure 6.3 Face Manifold Analysisof Laplacianfaces is formally stated below: Now, consider a simple example of image variability. Imagine that a set of face images are generated while the 1. PCA projection. We project the image set fxi g into the PCA subspace by throwing away the smallest princi- human face rotates slowly. Intuitively, the set of face images pal components. In our experiments, we kept 98 per- correspond to a continuous curve in image space since there is cent information in the sense of reconstruction error. only one degree of freedom, viz. the angel of rotation. Thus, For the sake of simplicity, we still use x to denote the we can say that the set of face images are intrinsically one- images in the PCA subspace in the following steps. We dimensional. Many recent works [7], [10], [18], [19], [21], [23], denote by WP CA the transformation matrix of PCA. [27] have shown that the face images do reside on a low- 2. Constructing the nearest-neighbor graph. Let G dimensional submanifold embedded in a high-dimensional denote a graph with n nodes. The ith node ambient space (image space). Therefore, an effective sub- corresponds to the face image xi . We put an edge space learning algorithm should be able to detect the between nodes i and j if xi and xj are “close,” i.e., xj is among k nearest neighbors of xi , or xi is among nonlinear manifold structure. The conventional algorithms, k nearest neighbors of xj . The constructed nearest- such as PCA and LDA, model the face images in Euclidean neighbor graph is an approximation of the local space. They effectively see only the Euclidean structure; thus, manifold structure. Note that here we do not use the they fail to detect the intrinsic low-dimensionality. -neighborhood to construct the graph. This is simply With its neighborhood preserving character, the Lapla- because it is often difficult to choose the optimal in cianfaces seem to be able to capture the intrinsic face manifold the real-world applications, while k nearest-neighbor structure to a larger extent. Fig. 1 shows an example that the graph can be constructed more stably. The disad- face images with various pose and expression of a person are vantage is that the k nearest-neighbor search will mapped into two-dimensional subspace. The face image data increase the computational complexity of our algo- rithm. When the computational complexity is a set used here is the same as that used in [18]. This data set major concern, one can switch to the -neighborhood. contains 1,965 face images taken from sequential frames of a 3. Choosing the weights. If node i and j are connected, small video. The size of each image is 20  28 pixels, with put 256 gray-levels per pixel. Thus, each face image is represented 2 by a point in the 560-dimensional ambient space. However, kxi Àxj k these images are believed to come from a submanifold with Sij ¼ eÀ t ; ð34Þ few degrees of freedom. We leave out 10 samples for testing,
  7. 7. 334 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 27, NO. 3, MARCH 2005Fig. 1. Two-dimensional linear embedding of face images by Laplacianfaces. As can be seen, the face images are divided into two parts, the faceswith open mouth and the faces with closed mouth. Moreover, it can be clearly seen that the pose and expression of human faces changecontinuously and smoothly, from top to bottom, from left to right. The bottom images correspond to points along the right path (linked by solid line),illustrating one particular mode of variability in pose.and the remaining 1,955 samples are used to learn the The only difference between them is that, LPP is linear whileLaplacianfaces. As can be seen, the face images are mapped Laplacian Eigenmap is nonlinear. Since they have the sameinto a two-dimensional space with continuous change in pose objective function, it would be important to see to what extentand expression. The representative face images are shown in LPP can approximate Laplacian Eigenmap. This can bethe different parts of the space. The face images are divided evaluated by comparing their eigenvalues. Fig. 3 shows theinto two parts. The left part includes the face images with eigenvalues computed by the two methods. As can be seen,open mouth, and the right part includes the face images with the eigenvalues of LPP is consistently greater than those of Laplacian Eigenmaps, but the difference between them isclosed mouth. This is because in trying to preserve local small. The difference gives a measure of the degree of thestructure in the embedding, the Laplacianfaces implicitly approximation.emphasizes the natural clusters in the data. Specifically, itmakes the neighboring points in the image face nearer in theface space, and faraway points in the image face farther in the 7 EXPERIMENTAL RESULTSface space. The bottom images correspond to points along the Some simple synthetic examples given in [9] show thatright path (linked by solid line), illustrating one particular LPP can have more discriminating power than PCA andmode of variability in pose. be less sensitive to outliers. In this section, several The 10 testing samples can be simply located in the experiments are carried out to show the effectiveness ofreduced representation space by the Laplacianfaces (column our proposed Laplacianfaces method for face representa-vectors of the matrix W ). Fig. 2 shows the result. As can be tion and recognition.seen, these testing samples optimally find their coordinates 7.1 Face Representation Using Laplacianfaceswhich reflect their intrinsic properties, i.e., pose and As we described previously, a face image can be representedexpression. This observation tells us that the Laplacianfaces as a point in image space. A typical image of size m  nare capable of capturing the intrinsic face manifold describes a point in m  n-dimensional image space. How-structure to some extent. ever, due to the unwanted variations resulting from changes Recall that both Laplacian Eigenmap and LPP aim to find in lighting, facial expression, and pose, the image space mighta map which preserves the local structure. Their objective not be an optimal space for visual representation.function is as follows: In Section 3, we have discussed how to learn a locality XÀ Á2 preserving face subspace which is insensitive to outlier and min fðxi Þ À fðxj Þ Sij : noise. The images of faces in the training set are used to learn f ij such a locality preserving subspace. The subspace is spanned
  8. 8. HE ET AL.: FACE RECOGNITION USING LAPLACIANFACES 335Fig. 2. Distribution of the 10 testing samples in the reduced representation subspace. As can be seen, these testing samples optimally find theircoordinates which reflect their intrinsic properties, i.e., pose and expression.by a set of eigenvectors of (35), i.e., w0 ; w1 ; . . . ; wkÀ1 . We can In this study, three face databases were tested. The first onedisplay the eigenvectors as images. These images may be is the PIE (pose, illumination, and expression) database fromcalled Laplacianfaces. Using the Yale face database as the CMU [25], the second one is the Yale database [31], and thetraining set, we present the first 10 Laplacianfaces in Fig. 4, third one is the MSRA database collected at the Microsofttogether with Eigenfaces and Fisherfaces. A face image can be Research Asia. In all the experiments, preprocessing to locatemapped into the locality preserving subspace by using the the faces was applied. Original images were normalized (inLaplacianfaces. It is interesting to note that the Laplacianfaces scale and orientation) such that the two eyes were aligned atare somehow similar to Fisherfaces. the same position. Then, the facial areas were cropped into the7.2 Face Recognition Using Laplacianfaces final images for matching. The size of each cropped image inOnce the Laplacianfaces are created, face recognition [2], all the experiments is 32 Â 32 pixels, with 256 gray levels per[14], [28], [29] becomes a pattern classification task. In this pixel. Thus, each image is repre-sented by a 1,024-dimen-section, we investigate the performance of our proposed sional vector in image space. The details of our methods forLaplacianfaces method for face recognition. The system face detection and alignment can be found in [30], [32]. Noperformance is compared with the Eigenfaces method [28] further preprocessing is done. Fig. 5 shows an example of theand the Fisherfaces method [2], two of the most popular original face image and the cropped image. Different patternlinear methods in face recognition. classifiers have been applied for face recognition, including nearest-neighbor [2], Bayesian [15], Support Vector Machine [17], etc. In this paper, we apply the nearest-neighbor classifier for its simplicity. The Euclidean metric is used as our distance measure. However, there might be some more sophisticated and better distance metric, e.g., variance- normalized distance, which may be used to improve the recognition performance. The number of nearest neighbors in our algorithm was set to be 4 or 7, according to the size of the training set. In short, the recognition process has three steps. First, we calculate the Laplacianfaces from the training set of face images; then the new face image to be identified is projected into the face subspace spanned by the Laplacianfaces; finally, the new face image is identified by a nearest-Fig. 3. The eigenvalues of LPP and Laplacian Eigenmap. neighbor classifier.
  9. 9. 336 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 27, NO. 3, MARCH 2005Fig. 4. The first 10 (a) Eigenfaces, (b) Fisherfaces, and (c) Laplacianfaces calculated from the face images in the YALE database.7.2.1 Yale Database subspace for each method. There is no significant improve-The Yale face database [31] was constructed at the Yale Center ment if more dimensions are used. Fig. 6 shows the plots offor Computational Vision and Control. It contains 165 gray- error rate versus dimensionality reduction.scale images of 15 individuals. The images demonstratevariations in lighting condition (left-light, center-light, right- 7.2.2 PIE Databaselight), facial expression (normal, happy, sad, sleepy, sur- The CMU PIE face database contains 68 subjects withprised, and wink), and with/without glasses. 41,368 face images as a whole. The face images were captured A random subset with six images per individual (hence, by 13 synchronized cameras and 21 flashes, under varying90 images in total) was taken with labels to form the pose, illumination, and expression. We used 170 face imagestraining set. The rest of the database was considered to be for each individual in our experiment, 85 for training and thethe testing set. The training samples were used to learn the other 85 for testing. Fig. 7 shows some of the faces with pose,Laplacianfaces. The testing samples were then projected illumination and expression variations in the PIE database.into the low-dimensional representation subspace. Recogni- Table 2 shows the recognition results. As can be seen,tion was performed using a nearest-neighbor classifier. Fisherfaces performs comparably to our algorithm on this Note that, for LDA, there are at most c À 1 nonzero database, while Eigenfaces performs poorly. The error rate forgeneralized eigenvalues and, so, an upper bound on the Laplacianfaces, Fisherfaces, and Eigenfaces are 4.6 percent,dimension of the reduced space is c À 1, where c is the 5.7 percent, and 20.6 percent, respectively. Fig. 8 shows a plotnumber of individuals [2]. In general, the performance of the of error rate versus dimensionality reduction. As can be seen,Eigenfaces method and the Laplacianfaces method varies the error rate of our Laplacianfaces method decreases fast aswith the number of dimensions. We show the best results the dimensionality of the face subspace increases, andobtained by Fisherfaces, Eigenfaces, and Laplacianfaces. achieves the best result with 110 dimensions. There is no The recognition results are shown in Table 1. It is found significant improvement if more dimensions are used.that the Laplacianfaces method significantly outperforms Eigenfaces achieves the best result with 150 dimensions. Forboth Eigenfaces and Fisherfaces methods. The recognition Fisherfaces, the dimension of the face subspace is bounded byapproaches the best results with 14, 28, 33 dimensions for c À 1, and it achieves the best result with c À 1 dimensions.Fisherfaces, Laplacianfaces, and Eigenfaces, respectively. The dashed horizontal line in the figure shows the best resultThe error rates of Fisherfaces, Laplacianfaces, and eigenfaces obtained by Fisherfaces.are 20 percent, 11.3 percent, and 25.3 percent, respectively.The corresponding face subspaces are called optimal face TABLE 1 Performance Comparison on the Yale DatabaseFig. 5. The original face image and the cropped image.
  10. 10. HE ET AL.: FACE RECOGNITION USING LAPLACIANFACES 337 TABLE 2 Performance Comparison on the PIE DatabaseFig. 6. Recognition accuracy versus dimensionality reduction onYale database.7.2.3 MSRA DatabaseThis database was collected at Microsoft Research Asia. Itcontains 12 individuals, captured in two different sessionswith different backgrounds and illuminations. Sixty-four toeighty face images were collected for each individual in eachsession. All the faces are frontal. Fig. 9 shows the samplecropped face images from this database. In this test, onesession was used for training and the other was used for Fig. 8. Recognition accuracy versus dimensionality reduction ontesting. Table 3 shows the recognition results. Laplacianfaces PIE database.method has lower error rate (8.2 percent) than those ofeigenfaces (35.4 percent) and fisherfaces (26.5 percent). Fig. 10 experimental results of the improvement of the performanceshows a plot of error rate versus dimensionality reduction. of our algorithm with the number of training samples. As can7.2.4 Improvement with the Number of Training be seen, our algorithm takes advantage of more training Samples samples. The performance improves significantly as theIn the above sections, we have shown that our Laplacian- number of training sample increases. The error rate offaces approach outperforms the Eigenfaces and Fisherfaces recognition reduces from 26.04 percent with 10 percentapproaches on three different face databases. In this section, training samples to 8.17 percent with 100 percent trainingwe evaluate the performances of these three approaches samples. This is due to the fact that, our algorithm aims atwith different numbers of training samples. We expect that discovering the underlying face manifold embedded in thea good algorithm should be able to take advantage of more ambient space and more training samples help to recover thetraining samples. manifold structure. Eigenface also takes advantage of more We used the MSRA database for this experiment. Let training samples to some extent. The error rate of EigenfaceMSRA-S1 denote the image set taken in the first session and reduces from 46.48 percent with 10 percent training samplesMSRA-S2 denote the image set taken in the second session.MSRA-S2 was exclusively used for testing. Ten percent, to 35.42 percent with 100 percent training samples. But, its40 percent, 70 percent, and 100 percent face images were performance starts to converge at 40 percent (trainingrandomly selected from MSRA-S1 for training. For each of the samples) and little improvement is obtained if more trainingfirst three cases, we repeated 20 times and computed the error samples are used. For Fisherfaces, there is no convincingrate in average. In Fig. 11 and Table 4, we show the evidence that it can take advantage of more training samples.Fig. 7. The sample cropped face images of one individual from PIE database. The original face images are taken under varying pose, illumination,and expression.
  11. 11. 338 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 27, NO. 3, MARCH 2005Fig. 9. The sample cropped face images of eight individuals from MSRA database. The face images in the first row are taken in the first session,which are used for training. The face images in the second row are taken in the second session, which are used for testing. The two images in thesame column are corresponding to the same individual.7.3 Discussion structure, Laplacianfaces can identify the person withThree experiments on three databases have been system- different expressions, poses, and lighting conditions.atically performed. These experiments reveal a number of 4. In all our experiments, the number of training samplesinteresting points: per subject is greater than one. Sometimes there might be only one training sample available for each subject. 1. All these three approaches performed better in the In such a case, LDA cannot work since the within-class optimal face subspace than in the original image space. scatter matrix SW becomes a zero matrix. For LPP, 2. In all the three experiments, Laplacianfaces consis- however, we can construct a complete graph and use tently performs better than Eigenfaces and Fish- inner products as its weights. Thus, LPP can give erfaces. Especially, it significantly outperforms similar result to PCA. Fisherfaces and Eigenfaces on Yale database and MSRA database. These experiments also show that our algorithm is especially suitable for frontal face 8 CONCLUSION AND FUTURE WORK images. Moreover, our algorithm takes advantage of The manifold ways of face analysis (representation and more training samples, which is important to the recognition) are introduced in this paper in order to detect real-world face recognition systems. the underlying nonlinear manifold structure in the manner 3. Comparing to the Eigenfaces method, the Laplacian- of linear subspace learning. To the best of our knowledge, faces method encodes more discriminating informa- this is the first devoted work on face representation and tion in the low-dimensional face subspace by recognition which explicitly considers the manifold struc- preserving local structure which is more important ture. The manifold structure is approximated by the than the global structure for classification, especially adjacency graph computed from the data points. Using the when nearest-neighbor-like classifiers are used. In notion of the Laplacian of the graph, we then compute a fact, if there is a reason to believe that Euclidean transformation matrix which maps the face images into a face distances ðjjxi À xj jjÞ are meaningful only if they are subspace. We call this the Laplacianfaces approach. The small (local), then our algorithm finds a projection that Laplacianfaces are obtained by finding the optimal linear respects such a belief. Another reason is that, as we approximations to the eigenfunctions of the Laplace Beltrami show in Fig. 1, the face images probably reside on a operator on the face manifold. This linear transformation nonlinear manifold. Therefore, an efficient and effec- tive subspace representation of face images should be optimally preserves local manifold structure. Theoretical capable of charactering the nonlinear manifold struc- analysis of the LPP algorithm and its connections to PCA and ture, while the Laplacianfaces are exactly derived by LDA are provided. Experimental results on PIE, Yale, and finding the optimal linear approximations to the MSRA databases show the effectiveness of our method. eigenfunctions of the Laplace Betrami operator on the face manifold. By discovering the face manifold TABLE 3 Performance Comparison on the MSRA Database Fig. 10. Recognition accuracy versus dimensionality reduction on MSRA database.
  12. 12. HE ET AL.: FACE RECOGNITION USING LAPLACIANFACES 339Fig. 11. Performance comparison on the MSRA database with different number of training samples. The training samples are from the set MSRA-S1.The images in MSRA-S2 are used for testing exclusively. As can be seen, our Laplacianfaces method takes advantage of more training samples.The error rate reduces from 26.04 percent with 10 percent training samples to 8.17 percent with 100 percent training samples. There is no evidencefor fisherfaces that it can take advantage of more training samples. One of the central problems in face manifold learning is REFERENCESto estimate the intrinsic dimensionality of the nonlinear face [1] A.U. Batur and M.H. Hayes, “Linear Subspace for Illuminationmanifold, or, degrees of freedom. We know that the Robust Face Recognition,” Proc. IEEE Int’l Conf. Computer Visiondimensionality of the manifold is equal to the dimension- and Pattern Recognition, Dec. 2001. [2] P.N. Belhumeur, J.P. Hespanha, and D.J. Kriegman, “Eigenfacesality of the local tangent space. Some previous works [35], vs. Fisherfaces: Recognition Using Class Specific Linear Projec-[36] show that the local tangent space can be approximated tion,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19,using points in a neighbor set. Therefore, one possibility is no. 7, pp. 711-720, July 1997. [3] M. Belkin and P. Niyogi, “Laplacian Eigenmaps and Spectralto estimate the dimensionality of the tangent space. Techniques for Embedding and Clustering,” Proc. Conf. Advances Another possible extension of our work is to consider the in Neural Information Processing System 15, 2001.use of the unlabeled samples. It is important to note that the [4] M. Belkin and P. Niyogi, “Using Manifold Structure for Partiallywork presented here is a general method for face analysis Labeled Classification,” Proc. Conf. Advances in Neural Information Processing System 15, 2002.(face representation and recognition) by discovering the [5] M. Brand, “Charting a Manifold,” Proc. Conf. Advances in Neuralunderlying face manifold structure. Learning the face Information Processing Systems, 2002.manifold (or learning Laplacianfaces) is essentially an [6] F.R.K. Chung, “Spectral Graph Theory,” Proc. Regional Conf. Seriesunsupervised learning process. And, in many practical in Math., no. 92, 1997.cases, one finds a wealth of easily available unlabeled [7] Y. Chang, C. Hu, and M. Turk, “Manifold of Facial Expression,” Proc. IEEE Int’l Workshop Analysis and Modeling of Faces andsamples. These samples might help to discover the face Gestures, Oct. 2003.manifold. For example, in [4], it is shown how unlabeled [8] R. Gross, J. Shi, and J. Cohn, “Where to Go with Facesamples are used for discovering the manifold structure Recognition,” Proc. Third Workshop Empirical Evaluation Methodsand, hence, improving the classification accuracy. Since the in Computer Vision, Dec. 2001. [9] X. He and P. Niyogi, “Locality Preserving Projections,” Proc. Conf.face images are believed to reside on a submanifold Advances in Neural Information Processing Systems, 2003.embedded in a high-dimensional ambient space, we believe [10] K.-C. Lee, J. Ho, M.-H. Yang, and D. Kriegman, “Video-Based Facethat the unlabeled samples are of great value. We are Recognition Using Probabilistic Appearance Manifolds,” Proc.currently exploring these problems in theory and practice. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 313- 320, 2003. [11] A. Levin and A. Shashua, “Principal Component Analysis over Continuous Subspaces and Intersection of Half-Spaces,” Proc. TABLE 4 European Conf. Computer Vision, May 2002. Recognition Error Rate with Different [12] S.Z. Li, X.W. Hou, H.J. Zhang, and Q.S. Cheng, “Learning Number of Training Samples Spatially Localized, Parts-Based Representation,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, Dec. 2001. [13] Q. Liu, R. Huang, H. Lu, and S. Ma, “Face Recognition Using Kernel Based Fisher Discriminant Analysis,” Proc. Fifth Int’l Conf. Automatic Face and Gesture Recognition, May 2002. [14] A.M. Martinez and A.C. Kak, “PCA versus LDA,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 228-233, Feb. 2001. [15] B. Moghaddam and A. Pentland, “Probabilistic Visual Learning for Object Representation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, pp. 696-710, 1997. [16] H. Murase and S.K. Nayar, “Visual Learning and Recognition of 3-D Objects from Appearance,” Int’l J. Computer Vision, vol. 14, pp. 5-24, 1995.
  13. 13. 340 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 27, NO. 3, MARCH 2005[17] P.J. Phillips, “Support Vector Machines Applied to Face Recogni- Xiaofei He received the BS degree in computer tion,” Proc. Conf. Advances in Neural Information Processing Systems science from Zhejiang University, China, in 2000. 11, pp. 803-809, 1998. He is currently working toward the PhD degree in[18] S.T. Roweis and L.K. Saul, “Nonlinear Dimensionality Reduction the Department of Computer Science, The by Locally Linear Embedding,” Science, vol. 290, Dec. 2000. University of Chicago. His research interests[19] S. Roweis, L. Saul, and G. Hinton, “Global Coordination of Local are machine learning, information retrieval, and Linear Models,” Proc. Conf. Advances in Neural Information computer vision. Processing System 14, 2001.[20] L.K. Saul and S.T. Roweis, “Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifolds,” J. Machine Learning Research, vol. 4, pp. 119-155, 2003.[21] H.S. Seung and D.D. Lee, “The Manifold Ways of Perception,” Science, vol. 290, Dec. 2000. Shuicheng Yan received the BS and PhD[22] T. Shakunaga and K. Shigenari, “Decomposed Eigenface for Face degrees from the Applied Mathematics Depart- Recognition under Various Lighting Conditions,” IEEE Int’l Conf. ment, School of Mathematical Sciences from Computer Vision and Pattern Recognition, Dec. 2001. Peking University, China, in 1999 and 2004,[23] A. Shashua, A. Levin, and S. Avidan, “Manifold Pursuit: A New respectively. His research interests include Approach to Appearance Based Recognition,” Proc. Int’l Conf. computer vision and machine learning. Pattern Recognition, Aug. 2002.[24] J. Shi and J. Malik, “Normalized Cuts and Image Segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, pp. 888-905, 2000.[25] T. Sim, S. Baker, and M. Bsat, “The CMU Pose, Illumination, and Expression (PIE) Database,” Proc. IEEE Int’l Conf. Automatic Face Yuxiao Hu received the Master’s degree in and Gesture Recognition, May 2002. computer science in 2001, and the Bachelor’s[26] L. Sirovich and M. Kirby, “Low-Dimensional Procedure for the Characterization of Human Faces,” J. Optical Soc. Am. A, vol. 4, degree in computer science in 1999, both from the pp. 519-524, 1987. Tsinghua University. From 2001 to 2004, He[27] J.B. Tenenbaum, V. de Silva, and J.C. Langford, “A Global worked as an assistant researcher in Media Geometric Framework for Nonlinear Dimensionality Reduction,” Computing Group, Microsoft Research, Asia. Science, vol. 290, Dec. 2000. Now, he is pursuing the PhD degree in University of Illinois at Urbana-Champaign. His current[28] M. Turk and A.P. Pentland, “Face Recognition Using Eigenfaces,” IEEE Conf. Computer Vision and Pattern Recognition, 1991. research interests include multimedia processing,[29] L. Wiskott, J.M. Fellous, N. Kruger, and C.v.d. Malsburg, “Face pattern recognition, and human head tracking. Recognition by Elastic Bunch Graph Matching,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, pp. 775-779, 1997.[30] R. Xiao, L. Zhu, and H.-J. Zhang, “Boosting Chain Learning for Partha Niyogi recieved the BTech degree from Object Detection,” Proc. IEEE Int’l Conf. Computer Vision, 2003. IIT Delhi and the SM degree and PhD degree in[31] Yale Univ. Face Database, http://cvc.yale.edu/projects/yalefaces/ electrical engineering and computer science yalefaces.html, 2002. from MIT. He is a professor in the Departments[32] S. Yan, M. Li, H.-J. Zhang, and Q. Cheng, “Ranking Prior of Computer Science and Statistics at The Likelihood Distributions for Bayesian Shape Localization Frame- University of Chicago. Before joining the Uni- work,” Proc. IEEE Int’l Conf. Computer Vision, 2003. versity of Chicago, he worked at Bell Labs as a[33] M.-H. Yang, “Kernel Eigenfaces vs. Kernel Fisherfaces: Face member of the technical staff for several years. Recognition Using Kernel Methods,” Proc. Fifth Int’l Conf. His research interests are in pattern recognition Automatic Face and Gesture Recognition, May 2002. and machine learning problems that arise in the[34] J. Yang, Y. Yu, and W. Kunz, “An Efficient LDA Algorithm for computational study of speech and language. This spans a range of Face Recognition,” Proc. Sixth Int’l Conf. Control, Automation, theoretical and applied problems in statistical learning, language Robotics and Vision, 2000. acquisition and evolution, speech recognition, and computer vision.[35] H. Zha and Z. Zhang, “Isometric Embedding and Continuum ISOMAP,” Proc. 20th Int’l Conf. Machine Learning, pp. 864-871, 2003. Hong-Jiang Zhang received the PhD degree[36] Z. Zhang and H. Zha, “Principal Manifolds and Nonlinear from the Technical University of Denmark and Dimension Reduction via Local Tangent Space Alignment,” the BS degree from Zhengzhou University, Technical Report CSE-02-019, CSE, Penn State Univ., 2002. China, both in electrical engineering, in 1982[37] W. Zhao, R. Chellappa, and P.J. Phillips, “Subspace Linear and 1991, respectively. From 1992 to 1995, he Discriminant Analysis for Face Recognition,” Technical Report was with the Institute of Systems Science, CAR-TR-914, Center for Automation Research, Univ. of Maryland, National University of Singapore, where he led 1999. several projects in video and image content analysis and retrieval and computer vision. From 1995 to 1999, he was a research manager at Hewlett-Packard Labs, responsible for research and technology transfers in the areas of multimedia management, computer vision and intelligent image processing. In 1999, he joined Microsoft Research and is currently the Managing Director of Advanced Technology Center. Dr. Zhang has authored three books, more than 300 referred papers, eight special issues of international journals on image and video processing, content-based media retrieval, and computer vision, as well as more than 50 patents or pending applications. He currently serves on the editorial boards of five IEEE/ACM journals and a dozen committees of international conferences. He is a fellow of the IEEE. . For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.