ICME 2013

1,528 views

Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

ICME 2013

  1. 1. IEEE International Conference on Multimedia & Expo 2013 Augmenting Descriptors for Fine-grained Visual Categorization Using Polynomial Embedding Hideki Nakayama Graduate School of Information Science and Technology The University of Tokyo
  2. 2. Outline  Background and Motivation  Our solution  Polynomial embedding  Experiment  Fine-grained categorization  Comparison with state-of-the-art  Conclusion 2
  3. 3. Fine-grained visual categorization (FGVC)  Distinguish hundreds of fine-grained objects under a certain domain (e.g., species of animals and plants)  Complement to traditional object recognition problems Caltech-256 [Griffin et al., 2007] Caltech-Bird-200 [Welinder et al., 2010] Generic Object Recognition FGVC Yellow Warbler Pririe Warbler Pine WarblerAirplane Monitor Dog V.S. 3
  4. 4. Motivation  We need highly discriminative features to distinguish visually very similar categories  Especially at local level. 4
  5. 5. Two basic ideas  1. Co-occurrence (correlation) of neighboring local descriptors  Shaplet [Sabzmeydani et al., 2007] Covariance feature [Tuzel et al., 2006] GGV [Harada et al., 2012] Expected to capture middle-level local information Results in high-dimensional local features  2. State-of-the-art bag-of-words representation  Based on higher-order statistics of local features  Fisher vector [Perronnin et al., 2010]  VLAD [Jegou et al., 2010] Remarkably high-performance, enables linear classification Dimensionality increases in linear to the size of local features ☹ ☺ ☺ ☹ (N: number of visual words, D: size of local features) ND 2ND conflict 5
  6. 6. Our approach  Compress polynomials of neighboring local features vector with supervised dimensionality reduction  Discriminative latent descriptor  Encode by means of bag-of-words (Fisher vector)  Logistic regression classifier 1,000~ 1,0000 dim 64 dimDescriptor (e.g. SIFT) Dense sampling polynomial vectors latent descriptor category label CCA (training) Fisher vector logistic regression classifier 6
  7. 7. ★● ● Exploit co-occurrence information e.g. SIFT ( ) ( ) ( )               = + − T yxyx T yxyx T yxyx yx yx Vec Vec upperVec ),(),( ),(),( ),(),( ),( 2 ),( δ δ vv vv vv v p ),( yxv),( yx σ−v ),( yx σ+v Neighbor (Left) Neighbor (Right) Descriptor at target position    ×  × Polynomial Vector a Matrixofvectorflattened:)(Vec 7
  8. 8. Exploit co-occurrence information  More spatial information can be integrated with more neighbors (but become high-dimensional) ( ) ( ) ( )               = + − T yxyx T yxyx T yxyx yx yx Vec Vec upperVec ),(),( ),(),( ),(),( ),( 2 ),( δ δ vv vv vv v p ( )        = T yxyx yx yx upperVec ),(),( ),(0 ),( vv v p ( ) ( ) ( ) ( ) ( )                       = + + − − T yxyx T yxyx T yxyx T yxyx T yxyx yx yx Vec Vec Vec Vec upperVec ),(),( ),(),( ),(),( ),(),( ),(),( ),( 4 ),( δ δ δ δ vv vv vv vv vv v p ★ ★● ● ★● ● ● ● 0-neighbor 2-neighbors 4-neighbors 2,144dim 10,336dim 18,528dim 8
  9. 9. Descriptor (e.g. SIFT) Dense sampling polynomial vectors latent descriptor Fisher vector logistic regression classifier category label CCA (training) 9
  10. 10.  Patch feature and label pairs  Category label: Binary occurrence vector  Strong supervision assumption  Most patches should be related to the content (category)  (Somewhat) justified for FGVC considering the applications  Users will target the object, sometimes can give segmentation Supervised dimensionality reduction Allium triquetrum                 0 0 1 0                  0 0 1 0                  0 0 1 0                  0 0 1 0  (We do not perform manual segmentation in this work, though)10
  11. 11.  Canonical Correlation Analysis (CCA) [Hotelling, 1936] ( ) ( ) tslltpps lp andbetweenncorrelatiothemaximizethat, tionstransformalinearfindsCCA featurelabel:ls),(polynomiafeaturepatch: −=−= TT BA Supervised dimensionality reduction ( ) ( )IBCBBCBCCC IACAACACCC ll T llplpplp pp T pplpllpl =Λ= =Λ= − − 21 21 nscorrelatiocanonical: matricescovariance: Λ C p l Canonical space s t s t Image feature Labels feature ( )pps −= T A Latent descriptor 1,000~ 1,0000 dim 64 dim (discriminative) 11
  12. 12. Descriptor (e.g. SIFT) Dense sampling polynomial vectors latent descriptor Fisher vector logistic regression classifier category label CCA (training) 12
  13. 13. Fisher Vector [Perronnin et al., 2010]  State-of-the-art bag-of-words encoding method using higher-level statistics of descriptors (mean and var) http://www.image-net.org/challenges/LSVRC/2010/ILSVRC2010_XRCE.pdf 13
  14. 14. Experiments
  15. 15. Experimental setup  FGVC Datasets  Oxford-Flowers-102  Caltech-Bird-200  Descriptors  SIFT, C-SIFT, Opponent-SIFT, Self Similarity  Compressed into 64dim using several methods  Fisher Vector  64 Gaussians (visual wods)  Global + 3 horizontal spatial regions  Classifier  Logistic regression  Evaluation  Mean classification accuracy Flowers Birds 15
  16. 16. Results: comparison with PCA and CCA  Our method substantially improves performance for all descriptors  Just applying CCA to concatenated neighbors does not improve performance  Polynomial embedding makes sense (non-linear convolution) 0 10 20 30 40 50 60 70 80 90 SIFT C-SIFT Opp.-SIFT SSIM PCA (baseline) CCA (4-neighbors) PolEmb (4-neighbors) 0 5 10 15 20 25 SIFT C-SIFT Opp.-SIFT SSIM Flower Bird Classification performance (%) with different embedding methods (all 64dim) Baseline (PCA) Ours 16 CCA without Pol.
  17. 17. Results: number of neighbors  Including more neighbors improves performance 0 10 20 30 40 50 60 70 80 90 SIFT C-SIFT Opp.-SIFT SSIM 0 5 10 15 20 25 SIFT C-SIFT Opp.-SIFT SSIM Classification performance (%) of our method with different number of neighbors Flower Bird ★ ★● ● ★● ● ● ● 17
  18. 18. Comparison with other work
  19. 19. Our final system  Combine four descriptors in late-fusion approach (SIFT, C-SIFT, Opp.-SIFT, SSIM)  Sum of log-likelihoods output by each classifier (weighted by its individual confidence) Descriptor 1 (e.g. SIFT) Dense sampling polynomial vectors category label CCA latent descriptor Fisher vector logistic regression classifier logistic regression classifier logistic regression classifier (training) Descriptor 2 Descriptor K + ・ ・ ・ ・ ・ ・ Same as above Same as above Allium triquetrum 19
  20. 20. Comparison on FGVC datasets  Our method outperforms previous work on bird and flower datasets For the bird dataset, [32] uses the bounding box only for training images, therefore the result is not directly comparable to ours. (PCA) (PolEmb) (PCA+PolEmb) ← baseline Mean classification accuracy (%) 20
  21. 21. ImageCLEF 2013 Plant Identification Flower FruitLeaf Stem Entire Kaki Persimmon Silver birch Boxelder mapple  Identify 250 plant species from different organs (Leaf, Flower, Fruit, etc.)  Got the 1st place in Natural Background task, and in 4/5 subtasks.  (Coming in Sept., 2013.) 21
  22. 22. Conclusion  A simple but effective method for FGVC  Embedding co-occurrence patterns of neighboring descriptors  Obtain discriminative and small-dimensional latent descriptor to use together with Fisher vector  Polynomial embedding greatly improves the performance, indicating the importance of non-linearity  Patch-level strong supervision approximation  Not always perfect but reasonable for FGVC problems  Future work  Theoretical analysis (probabilistic interpretation)  Multiple instance dimensionality reduction 22
  23. 23. Thank you!  Any questions? 23
  24. 24. Object and scene categorization  Caltech-101 (Object dataset)  MIT-Indoor-67 (Scene dataset) 24
  25. 25. Results: Object and scene categorization  Our method seems to be not as effective as in FGVC problems  Combining PCA feature + our feature consistent improves performance Mean classification accuracy (%) 0 10 20 30 40 50 60 70 80 SIFT C-SIFT Opp.-SIFT 0 10 20 30 40 50 60 SIFT C-SIFT Opp.-SIFT Caltech-101 MIT-Indoor-67 0 10 20 30 40 50 60 70 80 SIFT C-SIFT Opp.-SIFT 0 10 20 30 40 50 60 SIFT C-SIFT Opp.-SIFT 25

×