Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Category vectorspaceessex

580 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Category vectorspaceessex

  1. 1. A New Approach To The Multiclass Classification Problem Category Vector Space
  2. 2. <ul><li>Problem </li></ul><ul><li>Motivation </li></ul><ul><li>Discussion </li></ul><ul><li>Preliminary Results </li></ul>Agenda
  3. 3. <ul><li>Multi-class classification through binary classification </li></ul><ul><ul><li>One-vs-All </li></ul></ul><ul><ul><li>One-vs-One </li></ul></ul><ul><li>Multi-class classification can be constructed often as a generalization of binary classification </li></ul><ul><li>In practice multi-class classification is done by combining binary classifiers </li></ul>Classification Problem Problem
  4. 4. Object recognition Automated protein classification 300-600 Digit recognition Phoneme recognition [ Waibel, Hanzawa, Hinton,Shikano, Lang 1989 ] http://www.glue.umd.edu/~zhelin/recog.html <ul><li>The multi-class algorithm computationally expensive </li></ul>Multiclass Applications Large Category Space Problem 100 50 10
  5. 5. <ul><li>Hand-writing recognition (e.g., USPS) </li></ul><ul><li>Text classification </li></ul><ul><li>Face detection </li></ul><ul><li>Face expression recognition </li></ul>Other Multiclass Applications Problem
  6. 6. <ul><li>Data: {(x i ,y i )} i =1,…,n </li></ul>Classification Setup Question : design a classification rule y = f(x) such that, given a new x, this predicts y with minimal probability of error Training and test data drawn i.i.d. from fixed but unknown probability distribution D Labeled training set Problem
  7. 7. + + + + _ + _ _ _ _ + + _ <ul><li>Training examples mapped to </li></ul><ul><li>(usually high-dimensional) </li></ul><ul><li>feature space by a feature </li></ul><ul><li>map F( x ) = (F 1 ( x ), … , F d ( x )) </li></ul><ul><li>Learn linear decision boundary: </li></ul><ul><li>Trade-off between maximizing </li></ul><ul><li>geometric margin of the training </li></ul><ul><li>data and minimizing margin violations </li></ul>Support Vector Machines (SVMs) Problem
  8. 8. <ul><li>Linear classifier defined in feature space by </li></ul><ul><li>SVM solution gives </li></ul><ul><li> </li></ul><ul><li>as a linear combination of support vectors , a subset of the training vectors </li></ul>+ + + + _ + _ _ _ _ + + _ w b Definition Of SVM Classifiers Problem
  9. 9. Definition Of A Margin <ul><li>History (Vapnik, 1965) if linearly separable: </li></ul><ul><ul><li>Place hyerplane “far” from the data: large margin </li></ul></ul>Problem BAD
  10. 10. <ul><li>History (Vapnik, 1965) if linearly separable: </li></ul><ul><ul><li>Place hyerplane “far” from the data: large margin </li></ul></ul>Maximize The Margin <ul><li>Large margin classifier </li></ul><ul><ul><li>Leads to good generalization (performance on test sets) </li></ul></ul>Problem GOOD
  11. 11. <ul><li>One-vs-All (OVA) </li></ul><ul><ul><li>For each class build a classifier for that class vs the rest </li></ul></ul><ul><ul><li>Constructs k SVM models </li></ul></ul><ul><ul><li>Often very imbalanced classifiers </li></ul></ul><ul><ul><li>Asymmetry in the amount of training data </li></ul></ul><ul><ul><li>Earliest implementation for SVM multiclass </li></ul></ul><ul><li>One-vs-One (OVO) </li></ul><ul><ul><li>Constructs k(k-1)/2 classifiers </li></ul></ul><ul><ul><li>Rooted binary SVM’s with k leaves </li></ul></ul><ul><ul><li>Traverse tree to reach leaf node </li></ul></ul>Combining Binary Classifiers Problem
  12. 12. <ul><li>Race categories {White, Black, Asian} </li></ul><ul><li>Task: Map the image training set to the race labels </li></ul><ul><ul><li>The training (learning) </li></ul></ul><ul><ul><li>Test (generalization) </li></ul></ul><ul><li>Scenario: Ambiguous test image is presented </li></ul><ul><ul><li>Mixed race person </li></ul></ul><ul><ul><li>A person drawn from a race which is not represented by the system (i.e. Hispanics, Native Americans, etc) </li></ul></ul><ul><li>No way of assigning a mixed label </li></ul><ul><ul><li>The system cannot represent the mixed race person using a combination of categories </li></ul></ul><ul><li>No way of representing unknown race </li></ul><ul><li>Possible Solution: </li></ul><ul><ul><li>Indicate that the incoming image is outside the margin of each learned category </li></ul></ul>Example 1 Motivation
  13. 13. <ul><li>Musical samples generated by a single instrument </li></ul><ul><ul><li>Electric guitar—a set of note categories {C,C#,D,D#, etc.} </li></ul></ul><ul><li>Task: Map the training set musical notes to the labels </li></ul><ul><ul><li>Reasonable learning and generalization properties </li></ul></ul><ul><li>Scenario: Given musical sequences </li></ul><ul><ul><li>Intervals (two notes simultaneously struck such as {C,F#} ) </li></ul></ul><ul><ul><li>Chords (containing three or more notes) </li></ul></ul><ul><li>Ambiguity at the training set level </li></ul><ul><ul><li>Forced to assign new labels to intervals and chords even though they contain the same features—single notes—as the note categories </li></ul></ul><ul><li>Music sequence case, if we learned a conditional probability distribution p ( L = l | x ) </li></ul><ul><ul><li>x is a music sequence and L = { C , C #, D , · · · , B } is a set of note labels </li></ul></ul><ul><ul><li>When x is an interval—say a tritone </li></ul></ul><ul><ul><ul><li>No way of assigning high probability to the tritone </li></ul></ul></ul><ul><li>Possible Solution: </li></ul><ul><ul><li>Accommodate the tritone by assigning it a new label </li></ul></ul><ul><ul><ul><li>Large label space </li></ul></ul></ul><ul><ul><ul><li>Truncate because of exponential size considerations </li></ul></ul></ul>Example 2 Motivation
  14. 14. <ul><li>Categories are conceived as nominal labels </li></ul><ul><li>No underlying geometry for the categories </li></ul><ul><li>Inability of the conditional distribution to give us a measure (value) for interpolated categories </li></ul><ul><li>Non-represented interpolated categories are left out </li></ul><ul><li>Not easy to distinguish basic categories from compound categories </li></ul>Problems With Combining Binary Classifiers Motivation
  15. 15. <ul><li>Invoke the notion of a category vector space </li></ul><ul><li>Categories are defined with a geometric structure </li></ul><ul><ul><li>Assume that the set of categories(labels) forms a vector space </li></ul></ul><ul><li>Music sequence would correspond to a label in a twelve dimensional vector { C , C #, D , D #, E , F , F #, G , G #, A , A #, B } </li></ul><ul><li>Basic note C , C #, D etc. would have its own coordinate axis (vector space) </li></ul><ul><li>Learning problem: </li></ul><ul><ul><li>Map the training set music sequences to vectors in a 12 dimensional space such that the training and test set errors are small </li></ul></ul><ul><ul><li>Map the training musical sequences to the 12 dimensional vector space and then (if a support vector machine approach is used), maximize the margin of the mapped vectors in the category space </li></ul></ul><ul><ul><li>Race classification example is analogous </li></ul></ul><ul><ul><ul><li>Depends on how many races we wish to explicitly represent </li></ul></ul></ul><ul><ul><ul><li>Map the training set to race category vector space and maximize the margin </li></ul></ul></ul><ul><li>Generalization problem: </li></ul><ul><ul><li>Map a test set musical sequence or image into the category space and then ask if it lies within the margin of a note (or chord) or race category </li></ul></ul>Category Vector Spaces Solution Note: Extensions to other multi-category learning applications are straightforward assuming we can map category labels to coordinate. Motivation
  16. 16. Solution: The columns of W are the top D eigenvectors (corresponding to the largest eigenvalues) of Multiclass Fisher Related Idea D categories and a projected set of features defined by the MC-FLD maximizes where Given the feature vectors <ul><li>Eigenvectors are orthonormal </li></ul><ul><li>Columns of W constitute a category vector space </li></ul><ul><li>Interpret as a category space projection </li></ul><ul><li>Optimal solution is a set of orthogonal weight vectors </li></ul>Discussion
  17. 17. <ul><li>Avoided this approach since margins are not maximized in category space </li></ul><ul><li>We have not seen a classifier take a three class problem with labels {0,1,2}, map the input features into a vector space </li></ul><ul><ul><li>Basis vectors , and </li></ul></ul><ul><li>Attempt to maximize the margin in the category vector space </li></ul><ul><li>Not seen any previous work where a pattern from a compound category—say a combination of labels 1 and 2—is also used in training with a conversion of the compound category to a vector </li></ul>Discussion Disadvantage Of Multiclass Fisher
  18. 18. <ul><li>Input feature vectors are mapped to the category vector space using a kernel-based approach </li></ul><ul><li>In the category vector space, maximizing the margin is equivalent to forming hypercones </li></ul><ul><li>Mapped feature vectors that lie inside the hypercone have a distinct class label </li></ul><ul><li>Mapped vectors that lie in between hypercones are ambiguous </li></ul><ul><li>Hypercones are not allowed to intersect </li></ul>Depicts basic categories Description of Category Vector Spaces Discussion
  19. 19. <ul><li>Each pattern now exists as a linear superposition of category vectors in the category space. </li></ul><ul><ul><li>Ensures ambiguity is handled at a fundamental level </li></ul></ul><ul><li>Compound categories can be directly represented in the category space </li></ul><ul><li>Can maximize the compound category margin as well as the margins for the basic categories </li></ul>Advantages Of Category Vector Space Discussion
  20. 20. <ul><li>Regression </li></ul><ul><ul><li>Each input training set feature vector must be mapped to a corresponding point where M is the number of feature dimensions and D the cardinality of the basic categories </li></ul></ul><ul><li>Classification </li></ul><ul><ul><li>Each mapped feature vector must maximize its margin relative to its own category vector against the other category vectors Here is known and corresponds to a category vector </li></ul></ul>Technical Challenges Discussion
  21. 21. <ul><li>controls the width of the interval for which there is no penalty </li></ul><ul><li>Slack variable vectors are non-negative component-wise </li></ul><ul><li>Weight vector and bias help map the feature vector to its counterpart. </li></ul><ul><li>The choice of kernel K (GRBF or otherwise) is hidden in the ∗ operator which implements inner products by projecting vectors in into a suitable space </li></ul><ul><li>The regularization parameter weighs the norm of against the data fitting error. Larger the value of , the greater the emphasis on the data fitting error </li></ul>Regression In Category Space subject to the constraints Discussion
  22. 22. <ul><li>Associate each mapped vector with a category vector </li></ul><ul><li>Category vectors </li></ul><ul><ul><li>Can be basis vectors (axes corresponding to basic categories) in the category space </li></ul></ul><ul><ul><li>Ordinary vectors (corresponding to compound categories) </li></ul></ul><ul><li>In this definition of membership, no distinction is made between basic and compound categories. </li></ul><ul><li>We seek to maximize the margin in the category space </li></ul><ul><li>Minimizing the norm of the mapped vectors is equivalent to maximizing the margin provided the inequalities can be satisfied </li></ul>Classification In Category Space subject to the constraints Discussion
  23. 23. <ul><li>The design of the objective function is so we can obtain an integrated dual classification and regression objective </li></ul>Integrated Classification and Regression Objective Function Discussion
  24. 24. <ul><li>Gaussian radial basis function (GRBF) classifier with multiple outputs </li></ul><ul><ul><li>One for each basic category </li></ul></ul><ul><li>Given a training set of registered and cropped face images </li></ul><ul><ul><li>Labels are {White, Black, Asian} </li></ul></ul><ul><ul><ul><li>GRBF classifier to map the input feature vectors into the category space </li></ul></ul></ul><ul><li>Since we know the label of each training set pattern we approximate the mapped category space </li></ul>Multi-Category GRBF Preliminary Results Solution
  25. 25. <ul><li>45 training images from the “Labeled Faces in the Wild” image database </li></ul><ul><li>Database contains over 13,000 images that were captured using the Viola-Jones face detector </li></ul><ul><ul><li>Each face has been labeled with the corresponding name of the person </li></ul></ul><ul><ul><li>Of the 5749 people featured in the database </li></ul></ul><ul><ul><li>1680 individuals have multiple images with each image being unique </li></ul></ul><ul><li>In the 45 training images, 15 were from each of the three races considered </li></ul><ul><li>45 images registered to one “standard” image (after first converting them to grayscale) using a landmark-based thin-plate spline (TPS) </li></ul><ul><ul><li>The landmarks used were: </li></ul></ul><ul><ul><ul><li>Three(3) for each eye </li></ul></ul></ul><ul><ul><ul><li>Two(2) for the nose </li></ul></ul></ul><ul><ul><ul><li>Two(2) for the two ears (very approximate since the ears are often not visible). </li></ul></ul></ul><ul><li>After registering the images, they were cropped and resized to 130×90 with the intensity scale adjusted to [0,1]. </li></ul><ul><li>Free parameters were set to and These were carefully but qualitatively chosen to get a good training set separation in category space </li></ul>Experimental Setup White Basis = y 1 = [1,0,0] T Black Basis = y 2 = [0,1,0] T Asian Basis = y 3 = [0,0,1] T Preliminary Results
  26. 26. Training set images: Top row: Asian, Middle row: Black, Bottom row: White Race Classification Training Images Preliminary Results
  27. 27. Training set images mapped into the category vector space Category Space For Training Images Preliminary Results
  28. 28. <ul><li>Test set images: Top row: Asian, Middle row: Black, Bottom row: White </li></ul><ul><li>51 test set images (17 Asian, 16 Black, 18White) </li></ul><ul><li>Used the weights discovered by the GRBF classifier to map the input test set images into the </li></ul><ul><li>category space </li></ul>Race Classification Testing Images Preliminary Results
  29. 29. <ul><li>In the graph above we can see the separation in the category space </li></ul>Category Space Testing Images Preliminary Results
  30. 30. <ul><li>Pairwise classifications </li></ul><ul><li>Roughly separate each pair by drawing lines through the origin </li></ul><ul><li>Removing the orthogonal subspace that is not being compared against </li></ul>Pairwise Projection Of Category Space Testing Images <ul><li>The pairwise separations in the category space show an improved visualization </li></ul><ul><li>One could in fact draw separating boundaries in the three pairwise comparisons and obtain an overall decision boundary in 3D </li></ul>Preliminary Results
  31. 31. <ul><li>Nine ambiguous (from our perspective) faces </li></ul><ul><li>Wanted to exhibit the tolerance of ambiguity that is a hallmark of category spaces </li></ul><ul><li>The conclusion drawn from the result is a subjective one </li></ul>Ambiguous faces mapped into the category space. Note how they cluster together. Ambiguity Testing Preliminary Results
  32. 32. Experiment With MPEG-7 Database Butterfly Bat Bird Preliminary Results
  33. 33. Experiment With MPEG-7 Database Fly Chicken Batbird Preliminary Results
  34. 34. 3 Class Training Preliminary Results
  35. 35. 3 Class Testing Preliminary Results
  36. 36. 4 Class Training Preliminary Results
  37. 37. 4 Class Testing Preliminary Results
  38. 38. <ul><li>Fundamental contribution is learning of category spaces from patterns </li></ul><ul><li>Ensures ambiguity is handled at a fundamental level </li></ul><ul><li>Compound categories can be directly represented in the category space </li></ul><ul><li>Specific approach integrates regression and classification (iCAR) </li></ul><ul><ul><li>Combines a regression objective function (map the patterns) </li></ul></ul><ul><ul><li>Maximum margin objective function </li></ul></ul><ul><ul><li>(perform multicategory classification in category space) </li></ul></ul>Summary
  39. 39. Questions & Discussion Thank You
  40. 40. References [1] H. Guo. Diffeomorphic point matching with applications in medical image analysis . PhD thesis, University of Florida, Gainesville, FL, 2005. Ph.D. Thesis. [2] J. Zhang. New information theoretic distance measures and algorithms for multimodality image registration . PhD thesis, University of Florida, Gainesville, FL, 2005. Ph.D. Thesis. [3] A. A. Kumthekar. Affine image registration using minimum spanning tree entropies. Master’s thesis, University of Florida, Gainesville, FL, 2004. M. S. Thesis. [4] A. Rajwade, A. Banerjee, and A. Rangarajan. A new method of probability density estimation with application to mutual information-based image registration. In IEEE Computer Vision and Pattern Recognition (CVPR) , volume 2, pages 1769–1776, 2006. [5] A. Peter and A. Rangarajan. A new closed form information metric for shape analysis. In Medical Image Computing and Computer Assisted Intervention (MICCAI part 1) , Springer LNCS 4190, pages 249–256. 2006. [6] A. S. Roy, A. Gopinath, and A. Rangarajan. Deformable density matching for 3D non-rigid registration of shapes. In Medical Image Computing and Computer Assisted Intervention (MICCAI part 1) , Springer LNCS 4791, pages 942–949. 2007. [7] F.Wang, B. Vemuri, and A. Rangarajan. Groupwise point pattern registration using a novel CDF-based Jensen Shannon divergence. In IEEE Computer Vision and Pattern Recognition (CVPR) , volume 1, pages 1283–1288, 2006. [8] L. Garcin, A. Rangarajan, and L. Younes. Non-rigid registration of shapes via diffeomorphic point matching and clustering. In IEEE Conf. on Image Processing , volume 5, pages 3299–3302, 2004. [9] F. Wang, B.C. Vemuri, A. Rangarajan, I.M. Schmalfuss, and S.J. Eisenschenk. Simultaneous nonrigid registration of multiple point sets and atlas construction. In European Conference on Computer Vision (ECCV) , pages 551–563, 2006. [10] H. Guo, A. Rangarajan, and S. Joshi. 3D diffeomorphic shape registration on hippocampal datasets. In James S. Duncan and Guido Gerig, editors, Medical Image Computing and Computer Assisted Intervention (MICCAI) , pages 984–991. 2005.
  41. 41. References [11] A. Rangarajan, J. Coughlan, and A. L. Yuille. A Bayesian network framework for relational shape matching. In IEEE Intl. Conf. Computer Vision (ICCV) , volume 1, pages 671–678, 2003. [12] J. Zhang and A. Rangarajan. Multimodality image registration using an extensible information metric and high dimensional histogramming. In Information Processing in Medical Imaging , pages 725–737, 2005. [13] J. Zhang and A. Rangarajan. Affine image registration using a new information metric. In IEEE Computer Vision and Pattern Recognition (CVPR) , volume 1, pages 848–855, 2004. [14] J. Zhang and A. Rangarajan. A unified feature based registration method for multimodality images. In IEEE International Symposium on Biomedical Imaging (ISBI) , pages 724–727, 2004. [15] A. Peter and A. Rangarajan. Shape matching using the Fisher-Rao Riemannian metric: Unifying shape representation and deformation. In IEEE International Symposium on Biomedical Imaging (ISBI) , pages 1164–1167, 2006. [16] A. Rajwade, A. Banerjee, and A. Rangarajan. Continuous image representations avoid the histogram binning problem in mutual information-based registration. In IEEE International Symposium on Biomedical Imaging (ISBI) , pages 840–844, 2006. [17] H. Guo, A. Rangarajan, S. Joshi, and L. Younes. A new joint clustering and diffeomorphism estimation algorithm for non-rigid shape matching. In Chandra Khambametteu, editor, IEEE CVPR Workshop on Articulated and Non-rigid motion (ANM) , pages 16–22. 2004. [18] H. Guo, A. Rangarajan, S. Joshi, and L. Younes. Non-rigid registration of shapes via diffeomorphic point matching. In IEEE Intl. Symposium on Biomedical Imaging (ISBI) , volume 1, pages 924–927, 2004. [19] H. Guo, A. Rangarajan, and S. Joshi. Diffeomorphic point matching. In N. Paragios, Y. Chen, and O. Faugeras, editors, The Handbook of Mathematical Models in Computer Vision , pages 205–220. 2005. [20] A. Peter and A. Rangarajan. Maximum likelihood wavelet density estimation with applications to image and shape matching. IEEE Trans. Image Processing , 2007. (accepted subject to minor revision).
  42. 42. References [21] F. Wang, B.C. Vemuri, A. Rangarajan, and S.J. Eisenschenk. Simultaneous nonrigid registration of multiple point sets and atlas construction. IEEE Trans. Pattern Analysis and Machine Intelligence , 2007. (in press). [22] A. Peter and A. Rangarajan. Information geometry for landmark shape analysis: Unifying shape representation and deformation. IEEE Trans. Pattern Analysis and Machine Intelligence , 2007. (revised and resubmitted). [23] A. Rajwade, A. Banerjee, and A. Rangarajan. Probability density estimation using isocontours and isosurfaces: Applications to information theoretic image registration. IEEE Trans. Pattern Analysis and Machine Intelligence , 2007. (under revision). [24] A. Peter and A. Rangarajan. Shape L’Ane Rouge: Sliding wavelets for indexing and retrieval. In IEEE Computer Vision and Pattern Recognition (CVPR) , 2008. (submitted). [25] A. Rajwade, A. Banerjee, and A. Rangarajan. Newimage-based density estimators for 3D intermodality image registration. In IEEE Computer Vision and Pattern Recognition (CVPR) , 2008. (submitted). [26] A. Rangarajan and H. Chui. Applications of optimizing neural networks in medical image registration. In Artificial Neural Networks in Medicine and Biology (ANNIMAB) , Perspectives in neural computing, pages 99–104. Springer, 2000. [27] A. Rangarajan and H. Chui. A mixed variable optimization approach to non-rigid image registration. In Discrete Mathematical Problems with Medical Applications , volume 55 of DIMACS series in Discrete Mathematics and Computer Science , pages 105–123. American Mathematical Society, 2000. [28] H. Chui and A. Rangarajan. A new algorithm for non-rigid point matching. In Proceedings of IEEE Conf. on Computer Vision and Pattern Recognition–CVPR 2000 , volume 2, pages 44–51. IEEE Press, 2000. [29] H. Chui and A. Rangarajan. A feature registration framework using mixture models. In IEEEWorkshop on Mathematical Methods in Biomedical Image Analysis (MMBIA) , pages 190–197. IEEE Press, 2000. [30] H. Chui, L. Win, J. Duncan, R. Schultz, and A. Rangarajan. A unified feature registration method for brain mapping. In Information Processing in Medical Imaging (IPMI) , pages 300–314. Springer, 2001
  43. 43. References [ [31] A. Rangarajan. Learning matrix space image representations. In Energy Minimization Methods for Computer Vision and Pattern Recognition (EMMCVPR) , Lecture Notes in Computer Science, LNCS 2134, pages 153–168. Springer, New York, 2001. [32] A. Rangarajan, H. Chui, and E.Mjolsness. A relationship between spline-based deformable models and weighted graphs in non-rigid matching. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) , pages I:897–904. IEEE Press, 2001. [33] H. Chui and A. Rangarajan. Learning an atlas from unlabeled point-sets. In IEEE Workshop on Mathematical Methods in Biomedical Image Analysis (MMBIA) , pages 58–65. IEEE Press, 2001. [34] H. Chui and A. Rangarajan. A new joint point clustering and matching algorithm for estimating nonrigid deformations. In Intl. Conf. on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS) , pages I:309–315. CSREA Press, 2002. [35] A. Rangarajan and A. L. Yuille. MIME: Mutual information minimization and entropy maximization for Bayesian belief propagation. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14 , pages 873–880, Cambridge, MA, 2002. MIT Press. [36] A. L. Yuille and A. Rangarajan. The Concave Convex procedure (CCCP). In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14 , pages 1033–1040, Cambridge, MA, 2002. MIT Press. [37] H. Chui, L. Win, J. Duncan, R. Schultz, and A. Rangarajan. A unified non-rigid feature registration method for brain mapping. Medical Image Analysis , 7(2):113–130, 2003. [38] H. Chui and A. Rangarajan. A new point matching algorithm for non-rigid registration. Computer Vision and Image Understanding , 89(2-3):114–141, 2003. [39] A. L. Yuille and A. Rangarajan. The Concave-Convex procedure (CCCP). Neural Computation , 15:915–936, 2003. [40] H. Chui, A. Rangarajan, J. Zhang, and C.M. Leonard. Unsupervised learning of an atlas from unlabeled point-sets. IEEE Trans. Pattern Analysis and Machine Intelligence , 26(2):160–172, 2004. [41] P. Gardenfors. Conceptual spaces: The geometry of thought . MIT Press, 2000. [42] J. C. Platt, N. Cristianini, and J. Shawe-Taylor. Large margin DAGs for multiclass classification. In Advances in Neural Information Processing Systems (NIPS) , volume 12, pages 547–553. MIT Press, 2000.
  44. 44. References [43] Y. Lee, Y. Lin, and G. Wahba. Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association , 99:67–81, 2004. [44] C.-W. Hsu and C.-J. Lin. A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Networks , 13(2):415–425, 2002. [45] T. Kolb. Music theory for guitarists: Everything you ever wanted to know but were afraid to ask . Hal Leonard, 2005. [46] K. Fukunaga. Introduction to Statistical Pattern Recognition . Academic Press (second edition), 1990. [47] S. Mika, G. Ratsch, and K.-R. Muller. A mathematical programming approach to the kernel fisher algorithm. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13 , pages 591–597. MIT Press, 2001. [48] D. Widdows. Geometry and Meaning . Center for the Study of Language and Information, 2004. [49] T. Jebara. Machine Learning: Discriminative and Generative . Kluwer Academic Publishers, 2003. [50] V. Vapnik. Statistical Learning Theory . Wiley Interscience, 1998. [51] B. Scholkopf, A. Smola, R. C. Williamson, and P. L. Bartlett. New support vector algorithms. Neural Computation , 12(5):1207–1245, 2000. [52] M. E. Tipping. Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research , 1:211–244, 2001. [53] U. Kressel. Pairwise classification and support vector machines. In Advances in Kernel Methods - Support Vector Learning , pages 255–268. MIT Press, 1999. [54] C. M. Bishop. Pattern recognition and machine learning . Springer, 2006. [55] J. Weston and C. Watkins. Multi-class support vector machines. Technical Report CSD-TR-98-04, Department of Computer Science, Royal Holloway, University of London, 1998. [56] E. L. Allwein, R. E. Schapire, and Y. Singer. Reducing multiclass to binary: a unifying approach for margin classifiers. Journal of Machine Learning Research , 1:113–141, 2001. [57] J. C. Platt. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods - Support Vector Learning , pages 185–208. MIT Press, 1999.
  45. 45. References [58] L. Kaufman. Solving the quadratic programming problem arising in support vector classification. In B. Schölkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning , pages 147–168. MIT Press, 1999. [59] O. L. Mangasarian and D. R. Musicant. Lagrangian support vector machines. Journal of Machine Learning Research , 1(3):161–177, 2001. [60] G. M. Fung and O. L. Mangasarian. A feature selection Newton method for support vector machine classification. Computational Optimization and Applications , 28:185–2002, 2004. [61] T. Joachims. Making large-scale SVM learning practical. In B. Schölkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning , pages 169–184. MIT Press, 1999. [62] K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machine. Journal of Machine Learning Research , 2(2):265–292, Springer 2002. [63] J. A. K. Suykens and J. Vandewalle. Multiclass least squares support vector machines. In International Joint Conference on Neural Networks , volume 2, pages 900–903, 1999. [64] T. Joachims. Training linear SVMs in linear time. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining , volume 12, pages 217–226, 2006. [65] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, October 2007. Available at http://vis-www.cs.umass.edu/lfw . [66] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In IEEE Computer Vision and Pattern Recognition (CVPR) , volume 1, pages 511–518, 2001. [67] G. Wahba. Spline models for observational data . SIAM, Philadelphia, PA, 1990. [68] F. L. Bookstein. Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Trans. Patt. Anal. Mach. Intell. , 11(6):567–585, June 1989. [69] S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C.-H. Yeang, M. Angelo, C. Ladd, M. Reich, E. Latulippe, J. P. Mesirov, T. Poggio, W. Gerald, M. Lodadagger, E. S. Lander, and T. R. Golub. Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences (PNAS) , 98(26):15149–15154, 2001.
  46. 46. References [70] D. Lowe. Object recognition from local scale-invariant features. In IEEE International Conference on Computer Vision (ICCV) , volume 2, pages 1150–1157, 1999. [71] M. E. Tipping and C. M. Bishop. Mixtures of probabilistic principal component analyzers. Neural Computation , 11(2):443–482, 1999. [72] M. A. O. Vasilescu and D. Terzopoulos. Multilinear Image Analysis for Facial Recognition. In ICPR (2) , pages 511–514, 2002. [73] X. He, D. Cai, H. Liu, and J. Han. Image clustering with tensor representation. In Zhang H., Chua T., Steinmetz R., Kankanhalli M. S., and Wilcox L., editors, ACM Multimedia , pages 132–140. ACM, 2005. [74] J. B. MacQueen. Some Methods for classification and Analysis of Multivariate Observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability , volume 1, pages 281–297. University of California Press, 1967. [75] D. Titterington, A. Smith, and U. Makov. Statistical Analysis of Finite Mixture Distributions . John Wiley & Sons, 1985. [76] J. Pearl. Probabilistic Reasoning in Intelligent Systems : Networks of Plausible Inference . Morgan Kaufmann, September 1988. [77] X. He, D. Cai, and P. Niyogi. Tensor Subspace Analysis. InWeiss Y., Schölkopf B., and Platt J., editors, Advances in Neural Information Processing Systems 18 , pages 499–506. MIT Press, Cambridge, MA, 2006. [78] R. J. Hathaway. Another interpretation of the EM algorithm for mixture distributions. Statistics and Probability Letters , 4:53–56, 1986. [79] R. M. Neal and G. E. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Jordan M. I., editor, Learning in Graphical Models , pages 355–370. Kluwer, 1998. [80] A. L. Yuille and J. J. Kosowsky. Statistical physics algorithms that converge. Neural Computation , 6(3):341–356, May 1994. [81] A. L. Yuille, P. Stolorz, and J. Utans. Statistical physics, mixtures of distributions, and the EM algorithm. Neural Computation , 6(2):334–340, March 1994.
  47. 47. References [82] B. Leibe and B. Schiele. Analyzing appearance and contour based methods for object categorization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , volume 2, pages 409–415, Madison, WI, June 2003. [83] G. Griffin, A. Holub, and P. Perona. CalTech 256 object category dataset. Technical Report CNS-TR- 2007-001, Calif. Inst. of Tech., 2007.

×