Bundling Features for Large Scale Partial-Duplicate Web Image Search

770 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
770
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Bundling Features for Large Scale Partial-Duplicate Web Image Search

  1. 1. Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu∗, Qifa Ke, Michael Isard, and Jian Sun CVPR 2009 Citations: 163
  2. 2. Outline  Introduction  Bundled features  Image Retrieval using bundled feature  Experiments and results  Conclusion 2
  3. 3. INTRODUCTION 3
  4. 4. Target ‫ﻪ‬ Given a query image, is to locate its near- and partial-duplicate images in a large corpus of web images. 4
  5. 5. Novel Scheme ‫ﻪ‬ Each group of bundled features becomes much more discriminative than a single feature ‫ﻪ‬ within each group simple and robust geometric constraints can be efficiently enforced. 5
  6. 6. BUNDLED FEATURES 6
  7. 7. Related Work ‫ﻪ‬ SIFT(Scale Invariant Feature Transform) ‫ﻩ‬ keypoint & descriptor from the region centered at the keypoint ‫ﻪ‬ MSER(Maximally Stable Extremal Region) ‫ﻩ‬ Affine-covariant stable region + SIFT from the region 7
  8. 8. 8
  9. 9. Bundle Features ‫ﻪ‬ SIFT features: S = {sj} ‫ﻪ‬ MSER detections: R = {ri} ‫ﻪ‬ Define bundled feature B = {bi} : bi = {sj|sj ∝ ri, sj ∈ S} ‫ﻪ‬ We discard any MSER detection whose ellipse spans more than half the width or height of the image 9
  10. 10. 10
  11. 11. IMAGE RETRIEVAL USING BUNDLED FEATURE 11
  12. 12. Feature quantization ‫ﻪ‬ Hierarchical k-means ‫ﻩ‬ One million visual words from 50K training images 12
  13. 13. ‫ﻪ‬ K-D tree ‫ﻩ‬ pointList = [(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)] 13
  14. 14. Matching bundled features Let p = {pi} and q = {qj} be two bundled features with quantized visual words pi, qj ∈ W ‫ﻪ‬ Define a matching score : ‫ﻪ‬ M(q; p) = Mm(q; p) + λMg(q; p) ‫ﻩ‬ where λ is a weighting parameter 14
  15. 15. ‫ﻪ‬ Membership term: ‫ﻩ‬ We simply use the number of common visual words between two bundled features to define the membership term Mm(q; p) ‫ﻪ‬ Mm(q; p) = |{pi}| 15
  16. 16. ‫ﻪ‬ Geometric term: ‫ﻩ‬ Our geometric term performs a weak geometric verification between two bundled features p and q using relative ordering: ‫ﻪ‬ Indicator Function 16
  17. 17. 2 17
  18. 18. Indexing and retrieval ‫ﻪ‬ avoids storing and comparing high dimensional local descriptors ‫ﻪ‬ reduces the number of candidate images 18
  19. 19. Indexing and retrieval ‫ﻪ‬ Voting ‫ﻪ‬ 19
  20. 20. Indexing and retrieval ‫ﻪ‬ tf ‫ﻩ‬ 100 vocabularies in a document, ‘a’ 3 times ‫ﻩ‬ 0.03 (3/100) ‫ﻪ‬ idf ‫ﻩ‬ 1,000 documents have ‘a’, total number of documents 10,000,000 ‫ﻩ‬ 9.21 ( ln(10,000,000 / 1,000) ) ‫ﻪ‬ if-idf = 0.28( 0.03 * 9.21) 20
  21. 21. EXPERIMENTS AND RESULTS 21
  22. 22. Dataset ‫ﻪ‬ Basic dataset ‫ﻩ‬ One million images most frequently clicked in a popular commercial image-search engine ‫ﻩ‬ (50K, 200K, 500K) ‫ﻪ‬ Ground truth ‫ﻩ‬ Manually labeled 780 partial-duplicate web image form 19 groups. ‫ﻩ‬ Evaluation dataset = basic dataset + ground truth ‫ﻪ‬ Query ‫ﻩ‬ 150 images from ground truth 22
  23. 23. Evaluation ‫ﻪ‬ Baseline ‫ﻩ‬ Bag-of-features approach with soft assignment[13] [13] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR, 2008. 23
  24. 24. ‫ﻪ‬ Compare(HE) ‫ﻩ‬ enhance the with hamming embedding [3] by adding a 24-bit hamming code to filter out target features. [3] H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search. In ECCV, 2008. 24
  25. 25. baseline0.35 to Bundled(mem)0.40 a 14% improvement baseline0.35 to Bundled 0.49 a 40% improvement baseline0.35 to Bundled+HE0.52 a 49% improvement 25
  26. 26. ‫ﻪ‬ Compare(Re-ranking) ‫ﻩ‬ Full geometric verification, RANSAC for top 300 candidate images 26
  27. 27. Baseline+re-rank 0.50 to Bundled+re-rank 0.62 a 24% improvement Baseline 0.35 to Bundled+re-rank 0.62 a 77% improvement 27
  28. 28. ‫ﻪ‬ Trade-off ‫ﻪ‬ Run time ‫ﻩ‬ a single CPU on a 3.0GHz Core Duo desktop with 16G memory 28
  29. 29. Sample results Query Image Baseline approach Our approach 29
  30. 30. 30
  31. 31. CONCLUSION 31
  32. 32. Conclusion ‫ﻪ‬ Bundled features property ‫ﻩ‬ More discriminative than individual SIFT features. ‫ﻩ‬ Simple and robust geometric constraints ‫ﻩ‬ Partially match two groups of SIFT features ‫ﻪ‬ Advantage ‫ﻩ‬ Robustness to occlusion, photometric and geometric changes 32
  33. 33. END Thanks for your Listening

×