Your SlideShare is downloading. ×
0
ICME 2013
ICME 2013
ICME 2013
ICME 2013
ICME 2013
ICME 2013
ICME 2013
ICME 2013
ICME 2013
ICME 2013
ICME 2013
ICME 2013
ICME 2013
ICME 2013
ICME 2013
ICME 2013
ICME 2013
ICME 2013
ICME 2013
ICME 2013
ICME 2013
ICME 2013
ICME 2013
ICME 2013
ICME 2013
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

ICME 2013

572

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
572
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. IEEE International Conference on Multimedia & Expo 2013 Augmenting Descriptors for Fine-grained Visual Categorization Using Polynomial Embedding Hideki Nakayama Graduate School of Information Science and Technology The University of Tokyo
  • 2. Outline  Background and Motivation  Our solution  Polynomial embedding  Experiment  Fine-grained categorization  Comparison with state-of-the-art  Conclusion 2
  • 3. Fine-grained visual categorization (FGVC)  Distinguish hundreds of fine-grained objects under a certain domain (e.g., species of animals and plants)  Complement to traditional object recognition problems Caltech-256 [Griffin et al., 2007] Caltech-Bird-200 [Welinder et al., 2010] Generic Object Recognition FGVC Yellow Warbler Pririe Warbler Pine WarblerAirplane Monitor Dog V.S. 3
  • 4. Motivation  We need highly discriminative features to distinguish visually very similar categories  Especially at local level. 4
  • 5. Two basic ideas  1. Co-occurrence (correlation) of neighboring local descriptors  Shaplet [Sabzmeydani et al., 2007] Covariance feature [Tuzel et al., 2006] GGV [Harada et al., 2012] Expected to capture middle-level local information Results in high-dimensional local features  2. State-of-the-art bag-of-words representation  Based on higher-order statistics of local features  Fisher vector [Perronnin et al., 2010]  VLAD [Jegou et al., 2010] Remarkably high-performance, enables linear classification Dimensionality increases in linear to the size of local features ☹ ☺ ☺ ☹ (N: number of visual words, D: size of local features) ND 2ND conflict 5
  • 6. Our approach  Compress polynomials of neighboring local features vector with supervised dimensionality reduction  Discriminative latent descriptor  Encode by means of bag-of-words (Fisher vector)  Logistic regression classifier 1,000~ 1,0000 dim 64 dimDescriptor (e.g. SIFT) Dense sampling polynomial vectors latent descriptor category label CCA (training) Fisher vector logistic regression classifier 6
  • 7. ★● ● Exploit co-occurrence information e.g. SIFT ( ) ( ) ( )               = + − T yxyx T yxyx T yxyx yx yx Vec Vec upperVec ),(),( ),(),( ),(),( ),( 2 ),( δ δ vv vv vv v p ),( yxv),( yx σ−v ),( yx σ+v Neighbor (Left) Neighbor (Right) Descriptor at target position    ×  × Polynomial Vector a Matrixofvectorflattened:)(Vec 7
  • 8. Exploit co-occurrence information  More spatial information can be integrated with more neighbors (but become high-dimensional) ( ) ( ) ( )               = + − T yxyx T yxyx T yxyx yx yx Vec Vec upperVec ),(),( ),(),( ),(),( ),( 2 ),( δ δ vv vv vv v p ( )        = T yxyx yx yx upperVec ),(),( ),(0 ),( vv v p ( ) ( ) ( ) ( ) ( )                       = + + − − T yxyx T yxyx T yxyx T yxyx T yxyx yx yx Vec Vec Vec Vec upperVec ),(),( ),(),( ),(),( ),(),( ),(),( ),( 4 ),( δ δ δ δ vv vv vv vv vv v p ★ ★● ● ★● ● ● ● 0-neighbor 2-neighbors 4-neighbors 2,144dim 10,336dim 18,528dim 8
  • 9. Descriptor (e.g. SIFT) Dense sampling polynomial vectors latent descriptor Fisher vector logistic regression classifier category label CCA (training) 9
  • 10.  Patch feature and label pairs  Category label: Binary occurrence vector  Strong supervision assumption  Most patches should be related to the content (category)  (Somewhat) justified for FGVC considering the applications  Users will target the object, sometimes can give segmentation Supervised dimensionality reduction Allium triquetrum                 0 0 1 0                  0 0 1 0                  0 0 1 0                  0 0 1 0  (We do not perform manual segmentation in this work, though)10
  • 11.  Canonical Correlation Analysis (CCA) [Hotelling, 1936] ( ) ( ) tslltpps lp andbetweenncorrelatiothemaximizethat, tionstransformalinearfindsCCA featurelabel:ls),(polynomiafeaturepatch: −=−= TT BA Supervised dimensionality reduction ( ) ( )IBCBBCBCCC IACAACACCC ll T llplpplp pp T pplpllpl =Λ= =Λ= − − 21 21 nscorrelatiocanonical: matricescovariance: Λ C p l Canonical space s t s t Image feature Labels feature ( )pps −= T A Latent descriptor 1,000~ 1,0000 dim 64 dim (discriminative) 11
  • 12. Descriptor (e.g. SIFT) Dense sampling polynomial vectors latent descriptor Fisher vector logistic regression classifier category label CCA (training) 12
  • 13. Fisher Vector [Perronnin et al., 2010]  State-of-the-art bag-of-words encoding method using higher-level statistics of descriptors (mean and var) http://www.image-net.org/challenges/LSVRC/2010/ILSVRC2010_XRCE.pdf 13
  • 14. Experiments
  • 15. Experimental setup  FGVC Datasets  Oxford-Flowers-102  Caltech-Bird-200  Descriptors  SIFT, C-SIFT, Opponent-SIFT, Self Similarity  Compressed into 64dim using several methods  Fisher Vector  64 Gaussians (visual wods)  Global + 3 horizontal spatial regions  Classifier  Logistic regression  Evaluation  Mean classification accuracy Flowers Birds 15
  • 16. Results: comparison with PCA and CCA  Our method substantially improves performance for all descriptors  Just applying CCA to concatenated neighbors does not improve performance  Polynomial embedding makes sense (non-linear convolution) 0 10 20 30 40 50 60 70 80 90 SIFT C-SIFT Opp.-SIFT SSIM PCA (baseline) CCA (4-neighbors) PolEmb (4-neighbors) 0 5 10 15 20 25 SIFT C-SIFT Opp.-SIFT SSIM Flower Bird Classification performance (%) with different embedding methods (all 64dim) Baseline (PCA) Ours 16 CCA without Pol.
  • 17. Results: number of neighbors  Including more neighbors improves performance 0 10 20 30 40 50 60 70 80 90 SIFT C-SIFT Opp.-SIFT SSIM 0 5 10 15 20 25 SIFT C-SIFT Opp.-SIFT SSIM Classification performance (%) of our method with different number of neighbors Flower Bird ★ ★● ● ★● ● ● ● 17
  • 18. Comparison with other work
  • 19. Our final system  Combine four descriptors in late-fusion approach (SIFT, C-SIFT, Opp.-SIFT, SSIM)  Sum of log-likelihoods output by each classifier (weighted by its individual confidence) Descriptor 1 (e.g. SIFT) Dense sampling polynomial vectors category label CCA latent descriptor Fisher vector logistic regression classifier logistic regression classifier logistic regression classifier (training) Descriptor 2 Descriptor K + ・ ・ ・ ・ ・ ・ Same as above Same as above Allium triquetrum 19
  • 20. Comparison on FGVC datasets  Our method outperforms previous work on bird and flower datasets For the bird dataset, [32] uses the bounding box only for training images, therefore the result is not directly comparable to ours. (PCA) (PolEmb) (PCA+PolEmb) ← baseline Mean classification accuracy (%) 20
  • 21. ImageCLEF 2013 Plant Identification Flower FruitLeaf Stem Entire Kaki Persimmon Silver birch Boxelder mapple  Identify 250 plant species from different organs (Leaf, Flower, Fruit, etc.)  Got the 1st place in Natural Background task, and in 4/5 subtasks.  (Coming in Sept., 2013.) 21
  • 22. Conclusion  A simple but effective method for FGVC  Embedding co-occurrence patterns of neighboring descriptors  Obtain discriminative and small-dimensional latent descriptor to use together with Fisher vector  Polynomial embedding greatly improves the performance, indicating the importance of non-linearity  Patch-level strong supervision approximation  Not always perfect but reasonable for FGVC problems  Future work  Theoretical analysis (probabilistic interpretation)  Multiple instance dimensionality reduction 22
  • 23. Thank you!  Any questions? 23
  • 24. Object and scene categorization  Caltech-101 (Object dataset)  MIT-Indoor-67 (Scene dataset) 24
  • 25. Results: Object and scene categorization  Our method seems to be not as effective as in FGVC problems  Combining PCA feature + our feature consistent improves performance Mean classification accuracy (%) 0 10 20 30 40 50 60 70 80 SIFT C-SIFT Opp.-SIFT 0 10 20 30 40 50 60 SIFT C-SIFT Opp.-SIFT Caltech-101 MIT-Indoor-67 0 10 20 30 40 50 60 70 80 SIFT C-SIFT Opp.-SIFT 0 10 20 30 40 50 60 SIFT C-SIFT Opp.-SIFT 25

×