Report

Yusuke UchidaFollow

Aug. 2, 2014•0 likes•8,562 views

Aug. 2, 2014•0 likes•8,562 views

Download to read offline

Report

Technology

Recently, the Fisher vector representation of local features has attracted much attention because of its effectiveness in both image classification and image retrieval. Another trend in the area of image retrieval is the use of binary feature such as ORB, FREAK, and BRISK. Considering the significant performance improvement in terms of accuracy in both image classification and retrieval by the Fisher vector of continuous feature descriptors, if the Fisher vector were also to be applied to binary features, we would receive the same benefits in binary feature based image retrieval and classification. In this paper, we derive the closed-form approximation of the Fisher vector of binary features which are modeled by the Bernoulli mixture model. In experiments, it is shown that the Fisher vector representation improves the accuracy of image retrieval by 25% compared with a bag of binary words approach.

Yusuke UchidaFollow

Fisher Kernel based Relevance Feedback for Multimodal Video RetrievalIonut Mironica

Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super VectorUnited States Air Force Academy

Lec07 aggregation-and-retrieval-systemUnited States Air Force Academy

Lec11 object-re-idUnited States Air Force Academy

Lec12 review-part-iUnited States Air Force Academy

Lec17 sparse signal processing & applicationsUnited States Air Force Academy

- 1. Image Retrieval with Fisher Vectors of Binary Features KDDI R&D Laboratories, Inc. Yusuke Uchida, Shigeyuki Sakazawa
- 2. 2014/8/1 2 Image retrieval using local features • Local Invariant Feature: – Robust against occlusion, illumination change, viewpoint change, and so on • Applications: – Product search (Amazon Flow), landmark recognition (Google Goggles), augmented reality (Qualcomm Vuforia), …
- 3. 2014/8/1 3 Trends in image retrieval using local features • 1999: SIFT [Lowe,ICCV’99] • 2003: SIFT + Bag-of-visual words [Sivic+,ICCV’03] • 2007: SIFT + Fisher vector [Perronnin+,CVPR’07,ECCV’10] – New effective image representation • 2011: Local binary features (ORB [Rublee+,ICCV’11], FREAK, BRISK) – Efficient alternatives to SIFT or SURF • In this presentation： – Propose Fisher vector of binary features for image retrieval – Model binary features by Bernoulli mixture model (BMM) – Derive closed-form approximation of Fisher vector of BMM – New normalization method is applied to Fisher vector
- 4. 2014/8/1 4 Pipeline of image retrieval using local features Classifier (e.g. SVM) Similarity search － － ・・・－ － － ・・・－ － － ・・・－ － － ・・・－ － － ・・・－ A single vector representation of the image － － ・・・－ Region detection Feature description Aggregation A set of feature vector X
- 5. 2014/8/1 5 Position of this research Bag-of-visual words Fisher vector Continuous (SIFT, SURF) [1] [2, 3] Binary (ORB, FREAK, BRISK) [4] This research Aggregation methods Descriptortype [1] J. Sivic and A. Zisserman, "Video google: A text retrieval approach to object matching in videos," in Proc. of ICCV’03. [2] F. Perronnin and C. Dance, "Fisher kernels on visual vocabularies for image categorization," in Proc. of CVPR’07. [3] F. Perronnin, et al., "Improving the fisher kernel for large-scale image classification,” in Proc. of ECCV’10. [4] D. Galvez-Lopez and J. D. Tardos, "Real-time loop detection with bags of binary words," in Proc. of IROS’11. Accurate Fast
- 6. 2014/8/1 7 Fisher kernel [Jaakkola+, NIPS’98] • The generation process of X is modeled by a probability density function p(X|λ) with a parameter set λ • Describe X by the gradient of the log-likelihood function L(X|λ) = log P(X|λ) (=Fisher score) • Similarity between X and X’ is defined by the Fisher kernel K(X,X’): )|'(L)|(L)',( 1T λλ λλλ XFXXXK ∇∇= − Fisher score (gradient of log-likelihood function) Fisher information matrix ])|()|([E T λλ λλλ xLxLF ∇∇= [5] T. Jaakkola and D. Haussler, "Exploiting generative models in discriminative classifiers," in Proc. of NIPS'98.
- 7. 2014/8/1 8 Fisher vector [Perronnin+, CVPR’07] • Explicit feature mapping for Fisher kernel – As the Fisher information matrix (FIM) F is positive semidefinite and symmetric, it has a Cholesky decomposition: – Thus Fisher kernel can be rewritten as a dot-product between Fisher vectors zX and zX’: where )|(L λλλ XLzX ∇= Fisher scoreDecomposed FIM )|'(L)|(L)',( 1T λλ λλλ XFXXXK ∇∇= − λλλ LLF T =−1 ' T )',( XX zzXXK =
- 8. 2014/8/1 9 Fisher vector of GMM [Perronnin+, CVPR’07] ∑= Σ= N i iitit xNwxp 1 ),;()|( µλ F ,)|()|( 1 ∏= = T t txpXp λλ • SIFT features are modeled by Gaussian mixture model (GMM) Closed-form approximation of the Fisher vector of GMM under the following assumptions: 1. The Fisher information matrix F is diagonal 2. The number of features extracted from an image is constant and equal to T 3. The posterior probability r(i) is peaky • Compared with bag-of-visual words, Fisher vector contains higher order information
- 9. 2014/8/1 10 Local binary features [Rublee+, ICCV’11] • Local binary features: ORB, BRISK, FREAK, and many others – One or two magnitudes faster than SIFT or SURF – Multi-scale FAST detector or its variants – Binary descriptor based on binary tests on pixel’s luminance • Resulting in a binary vector (0, 0, 1, 0, 1, …, 1) A part of binary tests of ORB 256 • Binary tests are defined by pairs of positions • If the luminance of the first position is brighter than the luminance of second position; then the test generate bit ‘1’ Localfeatureregion
- 10. 2014/8/1 11 Modeling binary features by BMM • Model binary features by Bernoulli mixture model (BMM) ∏= = T t txpXp 1 )|()|( λλ ∑= = N i tiit xpwxp 1 )|()|( λλ ∏= − −= D d x id x idti tdtd xp 1 1 )1()|( µµλ }..1,..1,,{ DdNiw idi === µλ Naïve Bayes assumption Each feature is generated from one of the N components Single multivariate Bernoulli distribution X: A set of T binary feature X = (x1, …, xT) with D dimension (D bits) λ: a set of parameters Notations
- 11. 2014/8/1 12 Visualizing clustering results of BMM • The parameters λ are estimated by EM algorithm (for N =32) using 1M training ORB features binary tests with top 5 high probability of generating bit “0” binary tests with top 5 high probability of generating bit “1” • Mixture model successfully captures underlying bit correlation All binary tests defined in ORB Four components (clusters) out of N = 32 components
- 12. 2014/8/1 13 Fisher vector of BMM • Definition of Fisher vector: • Fisher score w.r.t. μid )|(L λλλ XLzX ∇= Fisher scoreDecomposed FIM )|(log)|(L λλ tt xpx = ∑= = N i tiit xpwxp 1 )|()|( λλ )1( )1( )( )|(L 1 1 tdtd td x id x id x t id t i x − − − − = ∂ ∂ µµ γ µ λ ∑= == N j tjj tii tt xpw xpw xipi 1 )|( )|( ),|()( λ λ λγ Posterior probability ∑= t txX )|(L)|(L λλ ∏= − −= D d x id x idti tdtd xp 1 1 )1()|( µµλ
- 13. 2014/8/1 14 Fisher vector of BMM • Fisher information w.r.t. μid (fμid) = 0 Posterior probability is peaky Fisher score
- 14. 2014/8/1 15 Posterior probability 𝑝(𝑖|𝑥 𝑡, 𝜆) • Histogram of max 𝑖 𝑝 𝑖 𝑥𝑡, 𝜆 (𝑁 = 256) • Peaky! 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
- 15. 2014/8/1 16 Vector normalization • Normalization is essential part of Fisher vector representation [3] • Power normalization [3] – 𝑧𝑧𝑖𝑖 = sgn 𝑧𝑖𝑖 |𝑧𝑖𝑖| 𝛼 (𝛼 = 0.5) • L2 normalization [3] – 𝑧𝑧𝑖𝑖 = 𝑧𝑖𝑖/ ∑ 𝑧𝑖𝑖 2 𝑖𝑗 • Intra normalization [6] (originally proposed for VLAD not for FV) – perform L2 normalization within each BMM component – 𝑧𝑧𝑖𝑖 = 𝑧𝑖𝑖/ ∑ 𝑧𝑖𝑖 2 𝑗 [3] F. Perronnin, et al., "Improving the fisher kernel for large-scale image classification,” in Proc. of ECCV’10. [6] R. Arandjelovic and A. Zisserman, "All about VLAD," in Proc. of CVPR'13. Originally proposed for FV
- 16. 2014/8/1 17 Experimental setup • Dataset: Stanford Mobile Visual Search – http://www.stanford.edu/~dmchen/mvs.html – CD class is used for evaluation • Performance measure: mean average precision (MAP) • Binary feature: ORB (OpenCV implementation, 4 scales, 900 features/image) 100 Reference image 400 Query images
- 17. 2014/8/1 18 Experimental results (1) • Compare the proposed Fisher vector with BoVW (N=1024) • Evaluate normalization methods (P=Power, In=Intra normalization) Number of mixture components BoVW Imp. FV • Fisher vector without any normalization achieves poor results • Power and/or L2 normalization significantly improves FV • Intra normalization outperforms the others in all N! Pure FV better In Norm FV
- 18. 2014/8/1 19 Experimental results (2) • Add independent images to database as a distractor • The Fisher vector achieves better performance in all database sizes • The degradation of the Fisher vector is relatively small =Proposed FV
- 19. 2014/8/1 20 Summary • Proposed Fisher vector of binary features for image retrieval – Model binary feature by Bernoulli mixture model (BMM) – Derive closed-form approximation of Fisher vector of BMM – Apply new normalization method to Fisher vector • Future work – Encode Fisher vector into a compact code for efficiency (The method proposed in [7] seems promising) – Apply proposed Fisher vector to other binary features (e.g., audio fingerprints) [7] Y. Gong et al., "Learning Binary Codes for High-Dimensional Data Using Bilinear Projections," in Proc. of CVPR'13.
- 20. 2014/8/1 21
- 21. 2014/8/1 22 Fisher vector of BMM (Fisher score) • Fisher score w.r.t. μid ∏≠= −− −−= ∂ ∂ D dee x ie x ie x id ti tetetd xp ,1 11 )1()1( )|( µµ µ λ )1( )1( )( )|( )|( )|(L 1 1 tdtd td x id x id x t t t id id t i xp xp x − − − − = ∂ ∂ = ∂ ∂ µµ γ λ λ µ µ λ ∑= == N j tjj tii tt xpw xpw xipi 1 )|( )|( ),|()( λ λ λγ Occupancy probability (posterior probability) )|(L λλλ XLzX ∇= Fisher scoreDecomposed FIM )|(log)|(L λλ tt xpx = ∑= = N i tiit xpwxp 1 )|()|( λλ