Successfully reported this slideshow.
Upcoming SlideShare
×

# Category vectorspaceessex

580 views

Published on

Published in: Technology
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### Category vectorspaceessex

1. 1. A New Approach To The Multiclass Classification Problem Category Vector Space
2. 2. <ul><li>Problem </li></ul><ul><li>Motivation </li></ul><ul><li>Discussion </li></ul><ul><li>Preliminary Results </li></ul>Agenda
3. 3. <ul><li>Multi-class classification through binary classification </li></ul><ul><ul><li>One-vs-All </li></ul></ul><ul><ul><li>One-vs-One </li></ul></ul><ul><li>Multi-class classification can be constructed often as a generalization of binary classification </li></ul><ul><li>In practice multi-class classification is done by combining binary classifiers </li></ul>Classification Problem Problem
4. 4. Object recognition Automated protein classification 300-600 Digit recognition Phoneme recognition [ Waibel, Hanzawa, Hinton,Shikano, Lang 1989 ] http://www.glue.umd.edu/~zhelin/recog.html <ul><li>The multi-class algorithm computationally expensive </li></ul>Multiclass Applications Large Category Space Problem 100 50 10
5. 5. <ul><li>Hand-writing recognition (e.g., USPS) </li></ul><ul><li>Text classification </li></ul><ul><li>Face detection </li></ul><ul><li>Face expression recognition </li></ul>Other Multiclass Applications Problem
6. 6. <ul><li>Data: {(x i ,y i )} i =1,…,n </li></ul>Classification Setup Question : design a classification rule y = f(x) such that, given a new x, this predicts y with minimal probability of error Training and test data drawn i.i.d. from fixed but unknown probability distribution D Labeled training set Problem
7. 7. + + + + _ + _ _ _ _ + + _ <ul><li>Training examples mapped to </li></ul><ul><li>(usually high-dimensional) </li></ul><ul><li>feature space by a feature </li></ul><ul><li>map F( x ) = (F 1 ( x ), … , F d ( x )) </li></ul><ul><li>Learn linear decision boundary: </li></ul><ul><li>Trade-off between maximizing </li></ul><ul><li>geometric margin of the training </li></ul><ul><li>data and minimizing margin violations </li></ul>Support Vector Machines (SVMs) Problem
8. 8. <ul><li>Linear classifier defined in feature space by </li></ul><ul><li>SVM solution gives </li></ul><ul><li> </li></ul><ul><li>as a linear combination of support vectors , a subset of the training vectors </li></ul>+ + + + _ + _ _ _ _ + + _ w b Definition Of SVM Classifiers Problem
9. 9. Definition Of A Margin <ul><li>History (Vapnik, 1965) if linearly separable: </li></ul><ul><ul><li>Place hyerplane “far” from the data: large margin </li></ul></ul>Problem BAD
10. 10. <ul><li>History (Vapnik, 1965) if linearly separable: </li></ul><ul><ul><li>Place hyerplane “far” from the data: large margin </li></ul></ul>Maximize The Margin <ul><li>Large margin classifier </li></ul><ul><ul><li>Leads to good generalization (performance on test sets) </li></ul></ul>Problem GOOD
11. 11. <ul><li>One-vs-All (OVA) </li></ul><ul><ul><li>For each class build a classifier for that class vs the rest </li></ul></ul><ul><ul><li>Constructs k SVM models </li></ul></ul><ul><ul><li>Often very imbalanced classifiers </li></ul></ul><ul><ul><li>Asymmetry in the amount of training data </li></ul></ul><ul><ul><li>Earliest implementation for SVM multiclass </li></ul></ul><ul><li>One-vs-One (OVO) </li></ul><ul><ul><li>Constructs k(k-1)/2 classifiers </li></ul></ul><ul><ul><li>Rooted binary SVM’s with k leaves </li></ul></ul><ul><ul><li>Traverse tree to reach leaf node </li></ul></ul>Combining Binary Classifiers Problem
12. 12. <ul><li>Race categories {White, Black, Asian} </li></ul><ul><li>Task: Map the image training set to the race labels </li></ul><ul><ul><li>The training (learning) </li></ul></ul><ul><ul><li>Test (generalization) </li></ul></ul><ul><li>Scenario: Ambiguous test image is presented </li></ul><ul><ul><li>Mixed race person </li></ul></ul><ul><ul><li>A person drawn from a race which is not represented by the system (i.e. Hispanics, Native Americans, etc) </li></ul></ul><ul><li>No way of assigning a mixed label </li></ul><ul><ul><li>The system cannot represent the mixed race person using a combination of categories </li></ul></ul><ul><li>No way of representing unknown race </li></ul><ul><li>Possible Solution: </li></ul><ul><ul><li>Indicate that the incoming image is outside the margin of each learned category </li></ul></ul>Example 1 Motivation
13. 13. <ul><li>Musical samples generated by a single instrument </li></ul><ul><ul><li>Electric guitar—a set of note categories {C,C#,D,D#, etc.} </li></ul></ul><ul><li>Task: Map the training set musical notes to the labels </li></ul><ul><ul><li>Reasonable learning and generalization properties </li></ul></ul><ul><li>Scenario: Given musical sequences </li></ul><ul><ul><li>Intervals (two notes simultaneously struck such as {C,F#} ) </li></ul></ul><ul><ul><li>Chords (containing three or more notes) </li></ul></ul><ul><li>Ambiguity at the training set level </li></ul><ul><ul><li>Forced to assign new labels to intervals and chords even though they contain the same features—single notes—as the note categories </li></ul></ul><ul><li>Music sequence case, if we learned a conditional probability distribution p ( L = l | x ) </li></ul><ul><ul><li>x is a music sequence and L = { C , C #, D , · · · , B } is a set of note labels </li></ul></ul><ul><ul><li>When x is an interval—say a tritone </li></ul></ul><ul><ul><ul><li>No way of assigning high probability to the tritone </li></ul></ul></ul><ul><li>Possible Solution: </li></ul><ul><ul><li>Accommodate the tritone by assigning it a new label </li></ul></ul><ul><ul><ul><li>Large label space </li></ul></ul></ul><ul><ul><ul><li>Truncate because of exponential size considerations </li></ul></ul></ul>Example 2 Motivation
14. 14. <ul><li>Categories are conceived as nominal labels </li></ul><ul><li>No underlying geometry for the categories </li></ul><ul><li>Inability of the conditional distribution to give us a measure (value) for interpolated categories </li></ul><ul><li>Non-represented interpolated categories are left out </li></ul><ul><li>Not easy to distinguish basic categories from compound categories </li></ul>Problems With Combining Binary Classifiers Motivation
15. 15. <ul><li>Invoke the notion of a category vector space </li></ul><ul><li>Categories are defined with a geometric structure </li></ul><ul><ul><li>Assume that the set of categories(labels) forms a vector space </li></ul></ul><ul><li>Music sequence would correspond to a label in a twelve dimensional vector { C , C #, D , D #, E , F , F #, G , G #, A , A #, B } </li></ul><ul><li>Basic note C , C #, D etc. would have its own coordinate axis (vector space) </li></ul><ul><li>Learning problem: </li></ul><ul><ul><li>Map the training set music sequences to vectors in a 12 dimensional space such that the training and test set errors are small </li></ul></ul><ul><ul><li>Map the training musical sequences to the 12 dimensional vector space and then (if a support vector machine approach is used), maximize the margin of the mapped vectors in the category space </li></ul></ul><ul><ul><li>Race classification example is analogous </li></ul></ul><ul><ul><ul><li>Depends on how many races we wish to explicitly represent </li></ul></ul></ul><ul><ul><ul><li>Map the training set to race category vector space and maximize the margin </li></ul></ul></ul><ul><li>Generalization problem: </li></ul><ul><ul><li>Map a test set musical sequence or image into the category space and then ask if it lies within the margin of a note (or chord) or race category </li></ul></ul>Category Vector Spaces Solution Note: Extensions to other multi-category learning applications are straightforward assuming we can map category labels to coordinate. Motivation
16. 16. Solution: The columns of W are the top D eigenvectors (corresponding to the largest eigenvalues) of Multiclass Fisher Related Idea D categories and a projected set of features defined by the MC-FLD maximizes where Given the feature vectors <ul><li>Eigenvectors are orthonormal </li></ul><ul><li>Columns of W constitute a category vector space </li></ul><ul><li>Interpret as a category space projection </li></ul><ul><li>Optimal solution is a set of orthogonal weight vectors </li></ul>Discussion
17. 17. <ul><li>Avoided this approach since margins are not maximized in category space </li></ul><ul><li>We have not seen a classifier take a three class problem with labels {0,1,2}, map the input features into a vector space </li></ul><ul><ul><li>Basis vectors , and </li></ul></ul><ul><li>Attempt to maximize the margin in the category vector space </li></ul><ul><li>Not seen any previous work where a pattern from a compound category—say a combination of labels 1 and 2—is also used in training with a conversion of the compound category to a vector </li></ul>Discussion Disadvantage Of Multiclass Fisher
18. 18. <ul><li>Input feature vectors are mapped to the category vector space using a kernel-based approach </li></ul><ul><li>In the category vector space, maximizing the margin is equivalent to forming hypercones </li></ul><ul><li>Mapped feature vectors that lie inside the hypercone have a distinct class label </li></ul><ul><li>Mapped vectors that lie in between hypercones are ambiguous </li></ul><ul><li>Hypercones are not allowed to intersect </li></ul>Depicts basic categories Description of Category Vector Spaces Discussion
19. 19. <ul><li>Each pattern now exists as a linear superposition of category vectors in the category space. </li></ul><ul><ul><li>Ensures ambiguity is handled at a fundamental level </li></ul></ul><ul><li>Compound categories can be directly represented in the category space </li></ul><ul><li>Can maximize the compound category margin as well as the margins for the basic categories </li></ul>Advantages Of Category Vector Space Discussion
20. 20. <ul><li>Regression </li></ul><ul><ul><li>Each input training set feature vector must be mapped to a corresponding point where M is the number of feature dimensions and D the cardinality of the basic categories </li></ul></ul><ul><li>Classification </li></ul><ul><ul><li>Each mapped feature vector must maximize its margin relative to its own category vector against the other category vectors Here is known and corresponds to a category vector </li></ul></ul>Technical Challenges Discussion
21. 21. <ul><li>controls the width of the interval for which there is no penalty </li></ul><ul><li>Slack variable vectors are non-negative component-wise </li></ul><ul><li>Weight vector and bias help map the feature vector to its counterpart. </li></ul><ul><li>The choice of kernel K (GRBF or otherwise) is hidden in the ∗ operator which implements inner products by projecting vectors in into a suitable space </li></ul><ul><li>The regularization parameter weighs the norm of against the data fitting error. Larger the value of , the greater the emphasis on the data fitting error </li></ul>Regression In Category Space subject to the constraints Discussion
22. 22. <ul><li>Associate each mapped vector with a category vector </li></ul><ul><li>Category vectors </li></ul><ul><ul><li>Can be basis vectors (axes corresponding to basic categories) in the category space </li></ul></ul><ul><ul><li>Ordinary vectors (corresponding to compound categories) </li></ul></ul><ul><li>In this definition of membership, no distinction is made between basic and compound categories. </li></ul><ul><li>We seek to maximize the margin in the category space </li></ul><ul><li>Minimizing the norm of the mapped vectors is equivalent to maximizing the margin provided the inequalities can be satisfied </li></ul>Classification In Category Space subject to the constraints Discussion
23. 23. <ul><li>The design of the objective function is so we can obtain an integrated dual classification and regression objective </li></ul>Integrated Classification and Regression Objective Function Discussion
24. 24. <ul><li>Gaussian radial basis function (GRBF) classifier with multiple outputs </li></ul><ul><ul><li>One for each basic category </li></ul></ul><ul><li>Given a training set of registered and cropped face images </li></ul><ul><ul><li>Labels are {White, Black, Asian} </li></ul></ul><ul><ul><ul><li>GRBF classifier to map the input feature vectors into the category space </li></ul></ul></ul><ul><li>Since we know the label of each training set pattern we approximate the mapped category space </li></ul>Multi-Category GRBF Preliminary Results Solution
25. 25. <ul><li>45 training images from the “Labeled Faces in the Wild” image database </li></ul><ul><li>Database contains over 13,000 images that were captured using the Viola-Jones face detector </li></ul><ul><ul><li>Each face has been labeled with the corresponding name of the person </li></ul></ul><ul><ul><li>Of the 5749 people featured in the database </li></ul></ul><ul><ul><li>1680 individuals have multiple images with each image being unique </li></ul></ul><ul><li>In the 45 training images, 15 were from each of the three races considered </li></ul><ul><li>45 images registered to one “standard” image (after first converting them to grayscale) using a landmark-based thin-plate spline (TPS) </li></ul><ul><ul><li>The landmarks used were: </li></ul></ul><ul><ul><ul><li>Three(3) for each eye </li></ul></ul></ul><ul><ul><ul><li>Two(2) for the nose </li></ul></ul></ul><ul><ul><ul><li>Two(2) for the two ears (very approximate since the ears are often not visible). </li></ul></ul></ul><ul><li>After registering the images, they were cropped and resized to 130×90 with the intensity scale adjusted to [0,1]. </li></ul><ul><li>Free parameters were set to and These were carefully but qualitatively chosen to get a good training set separation in category space </li></ul>Experimental Setup White Basis = y 1 = [1,0,0] T Black Basis = y 2 = [0,1,0] T Asian Basis = y 3 = [0,0,1] T Preliminary Results
26. 26. Training set images: Top row: Asian, Middle row: Black, Bottom row: White Race Classification Training Images Preliminary Results
27. 27. Training set images mapped into the category vector space Category Space For Training Images Preliminary Results
28. 28. <ul><li>Test set images: Top row: Asian, Middle row: Black, Bottom row: White </li></ul><ul><li>51 test set images (17 Asian, 16 Black, 18White) </li></ul><ul><li>Used the weights discovered by the GRBF classifier to map the input test set images into the </li></ul><ul><li>category space </li></ul>Race Classification Testing Images Preliminary Results
29. 29. <ul><li>In the graph above we can see the separation in the category space </li></ul>Category Space Testing Images Preliminary Results
30. 30. <ul><li>Pairwise classifications </li></ul><ul><li>Roughly separate each pair by drawing lines through the origin </li></ul><ul><li>Removing the orthogonal subspace that is not being compared against </li></ul>Pairwise Projection Of Category Space Testing Images <ul><li>The pairwise separations in the category space show an improved visualization </li></ul><ul><li>One could in fact draw separating boundaries in the three pairwise comparisons and obtain an overall decision boundary in 3D </li></ul>Preliminary Results
31. 31. <ul><li>Nine ambiguous (from our perspective) faces </li></ul><ul><li>Wanted to exhibit the tolerance of ambiguity that is a hallmark of category spaces </li></ul><ul><li>The conclusion drawn from the result is a subjective one </li></ul>Ambiguous faces mapped into the category space. Note how they cluster together. Ambiguity Testing Preliminary Results
32. 32. Experiment With MPEG-7 Database Butterfly Bat Bird Preliminary Results
33. 33. Experiment With MPEG-7 Database Fly Chicken Batbird Preliminary Results
34. 34. 3 Class Training Preliminary Results
35. 35. 3 Class Testing Preliminary Results
36. 36. 4 Class Training Preliminary Results
37. 37. 4 Class Testing Preliminary Results
38. 38. <ul><li>Fundamental contribution is learning of category spaces from patterns </li></ul><ul><li>Ensures ambiguity is handled at a fundamental level </li></ul><ul><li>Compound categories can be directly represented in the category space </li></ul><ul><li>Specific approach integrates regression and classification (iCAR) </li></ul><ul><ul><li>Combines a regression objective function (map the patterns) </li></ul></ul><ul><ul><li>Maximum margin objective function </li></ul></ul><ul><ul><li>(perform multicategory classification in category space) </li></ul></ul>Summary
39. 39. Questions & Discussion Thank You
40. 40. References [1] H. Guo. Diffeomorphic point matching with applications in medical image analysis . PhD thesis, University of Florida, Gainesville, FL, 2005. Ph.D. Thesis. [2] J. Zhang. New information theoretic distance measures and algorithms for multimodality image registration . PhD thesis, University of Florida, Gainesville, FL, 2005. Ph.D. Thesis. [3] A. A. Kumthekar. Affine image registration using minimum spanning tree entropies. Master’s thesis, University of Florida, Gainesville, FL, 2004. M. S. Thesis. [4] A. Rajwade, A. Banerjee, and A. Rangarajan. A new method of probability density estimation with application to mutual information-based image registration. In IEEE Computer Vision and Pattern Recognition (CVPR) , volume 2, pages 1769–1776, 2006. [5] A. Peter and A. Rangarajan. A new closed form information metric for shape analysis. In Medical Image Computing and Computer Assisted Intervention (MICCAI part 1) , Springer LNCS 4190, pages 249–256. 2006. [6] A. S. Roy, A. Gopinath, and A. Rangarajan. Deformable density matching for 3D non-rigid registration of shapes. In Medical Image Computing and Computer Assisted Intervention (MICCAI part 1) , Springer LNCS 4791, pages 942–949. 2007. [7] F.Wang, B. Vemuri, and A. Rangarajan. Groupwise point pattern registration using a novel CDF-based Jensen Shannon divergence. In IEEE Computer Vision and Pattern Recognition (CVPR) , volume 1, pages 1283–1288, 2006. [8] L. Garcin, A. Rangarajan, and L. Younes. Non-rigid registration of shapes via diffeomorphic point matching and clustering. In IEEE Conf. on Image Processing , volume 5, pages 3299–3302, 2004. [9] F. Wang, B.C. Vemuri, A. Rangarajan, I.M. Schmalfuss, and S.J. Eisenschenk. Simultaneous nonrigid registration of multiple point sets and atlas construction. In European Conference on Computer Vision (ECCV) , pages 551–563, 2006. [10] H. Guo, A. Rangarajan, and S. Joshi. 3D diffeomorphic shape registration on hippocampal datasets. In James S. Duncan and Guido Gerig, editors, Medical Image Computing and Computer Assisted Intervention (MICCAI) , pages 984–991. 2005.
41. 41. References [11] A. Rangarajan, J. Coughlan, and A. L. Yuille. A Bayesian network framework for relational shape matching. In IEEE Intl. Conf. Computer Vision (ICCV) , volume 1, pages 671–678, 2003. [12] J. Zhang and A. Rangarajan. Multimodality image registration using an extensible information metric and high dimensional histogramming. In Information Processing in Medical Imaging , pages 725–737, 2005. [13] J. Zhang and A. Rangarajan. Affine image registration using a new information metric. In IEEE Computer Vision and Pattern Recognition (CVPR) , volume 1, pages 848–855, 2004. [14] J. Zhang and A. Rangarajan. A unified feature based registration method for multimodality images. In IEEE International Symposium on Biomedical Imaging (ISBI) , pages 724–727, 2004. [15] A. Peter and A. Rangarajan. Shape matching using the Fisher-Rao Riemannian metric: Unifying shape representation and deformation. In IEEE International Symposium on Biomedical Imaging (ISBI) , pages 1164–1167, 2006. [16] A. Rajwade, A. Banerjee, and A. Rangarajan. Continuous image representations avoid the histogram binning problem in mutual information-based registration. In IEEE International Symposium on Biomedical Imaging (ISBI) , pages 840–844, 2006. [17] H. Guo, A. Rangarajan, S. Joshi, and L. Younes. A new joint clustering and diffeomorphism estimation algorithm for non-rigid shape matching. In Chandra Khambametteu, editor, IEEE CVPR Workshop on Articulated and Non-rigid motion (ANM) , pages 16–22. 2004. [18] H. Guo, A. Rangarajan, S. Joshi, and L. Younes. Non-rigid registration of shapes via diffeomorphic point matching. In IEEE Intl. Symposium on Biomedical Imaging (ISBI) , volume 1, pages 924–927, 2004. [19] H. Guo, A. Rangarajan, and S. Joshi. Diffeomorphic point matching. In N. Paragios, Y. Chen, and O. Faugeras, editors, The Handbook of Mathematical Models in Computer Vision , pages 205–220. 2005. [20] A. Peter and A. Rangarajan. Maximum likelihood wavelet density estimation with applications to image and shape matching. IEEE Trans. Image Processing , 2007. (accepted subject to minor revision).
42. 42. References [21] F. Wang, B.C. Vemuri, A. Rangarajan, and S.J. Eisenschenk. Simultaneous nonrigid registration of multiple point sets and atlas construction. IEEE Trans. Pattern Analysis and Machine Intelligence , 2007. (in press). [22] A. Peter and A. Rangarajan. Information geometry for landmark shape analysis: Unifying shape representation and deformation. IEEE Trans. Pattern Analysis and Machine Intelligence , 2007. (revised and resubmitted). [23] A. Rajwade, A. Banerjee, and A. Rangarajan. Probability density estimation using isocontours and isosurfaces: Applications to information theoretic image registration. IEEE Trans. Pattern Analysis and Machine Intelligence , 2007. (under revision). [24] A. Peter and A. Rangarajan. Shape L’Ane Rouge: Sliding wavelets for indexing and retrieval. In IEEE Computer Vision and Pattern Recognition (CVPR) , 2008. (submitted). [25] A. Rajwade, A. Banerjee, and A. Rangarajan. Newimage-based density estimators for 3D intermodality image registration. In IEEE Computer Vision and Pattern Recognition (CVPR) , 2008. (submitted). [26] A. Rangarajan and H. Chui. Applications of optimizing neural networks in medical image registration. In Artificial Neural Networks in Medicine and Biology (ANNIMAB) , Perspectives in neural computing, pages 99–104. Springer, 2000. [27] A. Rangarajan and H. Chui. A mixed variable optimization approach to non-rigid image registration. In Discrete Mathematical Problems with Medical Applications , volume 55 of DIMACS series in Discrete Mathematics and Computer Science , pages 105–123. American Mathematical Society, 2000. [28] H. Chui and A. Rangarajan. A new algorithm for non-rigid point matching. In Proceedings of IEEE Conf. on Computer Vision and Pattern Recognition–CVPR 2000 , volume 2, pages 44–51. IEEE Press, 2000. [29] H. Chui and A. Rangarajan. A feature registration framework using mixture models. In IEEEWorkshop on Mathematical Methods in Biomedical Image Analysis (MMBIA) , pages 190–197. IEEE Press, 2000. [30] H. Chui, L. Win, J. Duncan, R. Schultz, and A. Rangarajan. A unified feature registration method for brain mapping. In Information Processing in Medical Imaging (IPMI) , pages 300–314. Springer, 2001
43. 43. References [ [31] A. Rangarajan. Learning matrix space image representations. In Energy Minimization Methods for Computer Vision and Pattern Recognition (EMMCVPR) , Lecture Notes in Computer Science, LNCS 2134, pages 153–168. Springer, New York, 2001. [32] A. Rangarajan, H. Chui, and E.Mjolsness. A relationship between spline-based deformable models and weighted graphs in non-rigid matching. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) , pages I:897–904. IEEE Press, 2001. [33] H. Chui and A. Rangarajan. Learning an atlas from unlabeled point-sets. In IEEE Workshop on Mathematical Methods in Biomedical Image Analysis (MMBIA) , pages 58–65. IEEE Press, 2001. [34] H. Chui and A. Rangarajan. A new joint point clustering and matching algorithm for estimating nonrigid deformations. In Intl. Conf. on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS) , pages I:309–315. CSREA Press, 2002. [35] A. Rangarajan and A. L. Yuille. MIME: Mutual information minimization and entropy maximization for Bayesian belief propagation. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14 , pages 873–880, Cambridge, MA, 2002. MIT Press. [36] A. L. Yuille and A. Rangarajan. The Concave Convex procedure (CCCP). In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14 , pages 1033–1040, Cambridge, MA, 2002. MIT Press. [37] H. Chui, L. Win, J. Duncan, R. Schultz, and A. Rangarajan. A unified non-rigid feature registration method for brain mapping. Medical Image Analysis , 7(2):113–130, 2003. [38] H. Chui and A. Rangarajan. A new point matching algorithm for non-rigid registration. Computer Vision and Image Understanding , 89(2-3):114–141, 2003. [39] A. L. Yuille and A. Rangarajan. The Concave-Convex procedure (CCCP). Neural Computation , 15:915–936, 2003. [40] H. Chui, A. Rangarajan, J. Zhang, and C.M. Leonard. Unsupervised learning of an atlas from unlabeled point-sets. IEEE Trans. Pattern Analysis and Machine Intelligence , 26(2):160–172, 2004. [41] P. Gardenfors. Conceptual spaces: The geometry of thought . MIT Press, 2000. [42] J. C. Platt, N. Cristianini, and J. Shawe-Taylor. Large margin DAGs for multiclass classification. In Advances in Neural Information Processing Systems (NIPS) , volume 12, pages 547–553. MIT Press, 2000.
44. 44. References [43] Y. Lee, Y. Lin, and G. Wahba. Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association , 99:67–81, 2004. [44] C.-W. Hsu and C.-J. Lin. A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Networks , 13(2):415–425, 2002. [45] T. Kolb. Music theory for guitarists: Everything you ever wanted to know but were afraid to ask . Hal Leonard, 2005. [46] K. Fukunaga. Introduction to Statistical Pattern Recognition . Academic Press (second edition), 1990. [47] S. Mika, G. Ratsch, and K.-R. Muller. A mathematical programming approach to the kernel fisher algorithm. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13 , pages 591–597. MIT Press, 2001. [48] D. Widdows. Geometry and Meaning . Center for the Study of Language and Information, 2004. [49] T. Jebara. Machine Learning: Discriminative and Generative . Kluwer Academic Publishers, 2003. [50] V. Vapnik. Statistical Learning Theory . Wiley Interscience, 1998. [51] B. Scholkopf, A. Smola, R. C. Williamson, and P. L. Bartlett. New support vector algorithms. Neural Computation , 12(5):1207–1245, 2000. [52] M. E. Tipping. Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research , 1:211–244, 2001. [53] U. Kressel. Pairwise classification and support vector machines. In Advances in Kernel Methods - Support Vector Learning , pages 255–268. MIT Press, 1999. [54] C. M. Bishop. Pattern recognition and machine learning . Springer, 2006. [55] J. Weston and C. Watkins. Multi-class support vector machines. Technical Report CSD-TR-98-04, Department of Computer Science, Royal Holloway, University of London, 1998. [56] E. L. Allwein, R. E. Schapire, and Y. Singer. Reducing multiclass to binary: a unifying approach for margin classifiers. Journal of Machine Learning Research , 1:113–141, 2001. [57] J. C. Platt. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods - Support Vector Learning , pages 185–208. MIT Press, 1999.
45. 45. References [58] L. Kaufman. Solving the quadratic programming problem arising in support vector classification. In B. Schölkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning , pages 147–168. MIT Press, 1999. [59] O. L. Mangasarian and D. R. Musicant. Lagrangian support vector machines. Journal of Machine Learning Research , 1(3):161–177, 2001. [60] G. M. Fung and O. L. Mangasarian. A feature selection Newton method for support vector machine classification. Computational Optimization and Applications , 28:185–2002, 2004. [61] T. Joachims. Making large-scale SVM learning practical. In B. Schölkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning , pages 169–184. MIT Press, 1999. [62] K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machine. Journal of Machine Learning Research , 2(2):265–292, Springer 2002. [63] J. A. K. Suykens and J. Vandewalle. Multiclass least squares support vector machines. In International Joint Conference on Neural Networks , volume 2, pages 900–903, 1999. [64] T. Joachims. Training linear SVMs in linear time. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining , volume 12, pages 217–226, 2006. [65] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, October 2007. Available at http://vis-www.cs.umass.edu/lfw . [66] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In IEEE Computer Vision and Pattern Recognition (CVPR) , volume 1, pages 511–518, 2001. [67] G. Wahba. Spline models for observational data . SIAM, Philadelphia, PA, 1990. [68] F. L. Bookstein. Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Trans. Patt. Anal. Mach. Intell. , 11(6):567–585, June 1989. [69] S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C.-H. Yeang, M. Angelo, C. Ladd, M. Reich, E. Latulippe, J. P. Mesirov, T. Poggio, W. Gerald, M. Lodadagger, E. S. Lander, and T. R. Golub. Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences (PNAS) , 98(26):15149–15154, 2001.
46. 46. References [70] D. Lowe. Object recognition from local scale-invariant features. In IEEE International Conference on Computer Vision (ICCV) , volume 2, pages 1150–1157, 1999. [71] M. E. Tipping and C. M. Bishop. Mixtures of probabilistic principal component analyzers. Neural Computation , 11(2):443–482, 1999. [72] M. A. O. Vasilescu and D. Terzopoulos. Multilinear Image Analysis for Facial Recognition. In ICPR (2) , pages 511–514, 2002. [73] X. He, D. Cai, H. Liu, and J. Han. Image clustering with tensor representation. In Zhang H., Chua T., Steinmetz R., Kankanhalli M. S., and Wilcox L., editors, ACM Multimedia , pages 132–140. ACM, 2005. [74] J. B. MacQueen. Some Methods for classification and Analysis of Multivariate Observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability , volume 1, pages 281–297. University of California Press, 1967. [75] D. Titterington, A. Smith, and U. Makov. Statistical Analysis of Finite Mixture Distributions . John Wiley & Sons, 1985. [76] J. Pearl. Probabilistic Reasoning in Intelligent Systems : Networks of Plausible Inference . Morgan Kaufmann, September 1988. [77] X. He, D. Cai, and P. Niyogi. Tensor Subspace Analysis. InWeiss Y., Schölkopf B., and Platt J., editors, Advances in Neural Information Processing Systems 18 , pages 499–506. MIT Press, Cambridge, MA, 2006. [78] R. J. Hathaway. Another interpretation of the EM algorithm for mixture distributions. Statistics and Probability Letters , 4:53–56, 1986. [79] R. M. Neal and G. E. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Jordan M. I., editor, Learning in Graphical Models , pages 355–370. Kluwer, 1998. [80] A. L. Yuille and J. J. Kosowsky. Statistical physics algorithms that converge. Neural Computation , 6(3):341–356, May 1994. [81] A. L. Yuille, P. Stolorz, and J. Utans. Statistical physics, mixtures of distributions, and the EM algorithm. Neural Computation , 6(2):334–340, March 1994.
47. 47. References [82] B. Leibe and B. Schiele. Analyzing appearance and contour based methods for object categorization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , volume 2, pages 409–415, Madison, WI, June 2003. [83] G. Griffin, A. Holub, and P. Perona. CalTech 256 object category dataset. Technical Report CNS-TR- 2007-001, Calif. Inst. of Tech., 2007.