Demo
Example
Spatial re-ranking  • improves precision  • but not recall …
Query images               Prec.                                               Rec.• high precision at low recall (like go...
Why aren’t all objects retrieved?                                   Set of SIFT query image                       descript...
Query expansion
Query ExpansionIn text :   • Reissue top N results as queries  • Pseudo/blind relevance feedback  • Danger of topic drift ...
Query Expansion: Text      Original query: Hubble Telescope Achievements      Query expansion: Select top 20 terms from to...
Query ExpansionIn text :   • Reissue top N results as queries  • Pseudo/blind relevance feedback  • Danger of topic drift ...
Visual query expansion - overview  1. Original query                                  2. Initial retrieval set            ...
What Query Expansion AddsQuery Image   Originally retrieved image   Originally not retrieved
What Query Expansion Adds
What Query Expansion Adds
What Query Expansion Adds
Visual query expansion - overview  1. Original query                                  2. Initial retrieval set            ...
Bag of visual words particular object retrieval                                                                centroids  ...
Demo
Query ExpansionQuery image      Originally retrieved   Retrieved only                                        after expansion
Original results (good)                          Prec.Queryimage                                           Rec.        Exp...
Better Quantization
Problems arising from quantization• Typically, quantization has a significant impact on the  final performance of the syst...
And more …i. Points 3 and 4 are close, but never matchedi. Points 1, 2 and 3 are matched equally
Overcoming quantization errors• Soft-assign each descriptor to multiple cluster centers• Assignment weight according to Ga...
Several other solutions are possible …e.g. Hamming embedding [Jegou&Schmid ECCV 2008]• Standard quantization using bag-of-...
Soft Assignment: Implementation Bag of words: score a match between two features by thescalar product of their weight vect...
Soft Assignment: ResultsBenefit 1: Helping Query Expansion                                          Hard                  ...
Soft Assignment: Results  Benefit 1: Helping Query Expansion                                          Soft                ...
Soft Assignment: ResultsBenefit 2: Better spatial localization             Hard          Assignment             Soft      ...
Results: Baseline to State of the Art                                        Mean                                   Averag...
Outline1. Object recognition cast as nearest neighbour matching2. Object recognition cast as text retrieval3. Large scale ...
ApplicationAccessing expert knowledge:   • Use an image query to access an annotated dataset   • Search with query image r...
Visual Access to Classical Art Archives        Currently: 111 thousand Greek vase images
http://explore.clarosnet.org/XDB/ASP/clarosHome/
Application:Object Mining in Large Datasets
Objective …Automatically find and group images of same object/scene
Motivation             Applications:                • Dataset summarization                • Efficient retrieval          ...
Matching GraphBuild a ‘matching graph’ over all the images in the datasetEach image is a node and a link represents two im...
Finding Commonly Occurring Objects  Simple idea: strong spatial constraints gives a linkbetween two images                ...
Finding Commonly Occurring Objects  Use these links to build up a graph over all images in thedataset  Nodes = images, edg...
Building the Matching Graph• Use each image to query the dataset• Each query gives a list of results scored by  a measure ...
Connected ComponentsIn a collection of images of multiple disjoint objects weexpect the matching graph to also be disjoint...
Connected ComponentsExample: five connected components from the Oxforddataset         56 images 71 images 26 images 25 ima...
Connected Components A problem with connected components is that ‘connecting images’ can sometimes join two disjoint objec...
Datasets  Statue of Liberty dataset (37,034 images)        • Crawled from Flickr by querying for ‘statue of          liber...
Results: Statue of Liberty Largest cluster – 8461 images of the Statue of Liberty
Results: Statue of Liberty 2nd largest – 276 aerial views of New York
Results: Statue of Liberty 3rd largest – 80 American flags
Results: Statue of LibertySmaller clustersLego Statue of Liberty 59 imagesStaten Island 52 images
Results: Rome                18676                images                15818                images                9632   ...
Timings    21,339 high resolution images from Flickr tagged withstatue of liberty   Querying with every image in the datab...
As an aside …Better matching with fewer features[Turcot & Lowe, ICCV Workshop 2009].• Build matching graph• Augment image ...
Application: Internet-based inpaintingPhoto-editing using images of the same place[Whyte, Sivic and Zisserman, 2009], but ...
Application: place recognition(retrieval in a structured (on a map) database)    Query            Optimized image database...
Correctly recognized examples                                17
More correctly recognized examples
Application: Matching and 3D reconstruction in   large unstructured datasets.Building Rome in a Day, SameerAgarwal, Noah S...
Example of the final 3D point cloud and cameras57,845 downloaded images, 11,868 registered images. This video: 4,619 image...
Application: Mobile visual search apps Bing visual scanand others… Snaptell.com, Moodstocks.com
Example          Slide credit: I. Laptev
Papers and DemosSivic, J. and Zisserman, A.Video Google: A Text Retrieval Approach to Object Matching in VideosProceedings...
Andrew Zisserman Talk - Part 1b
Andrew Zisserman Talk - Part 1b
Andrew Zisserman Talk - Part 1b
Andrew Zisserman Talk - Part 1b
Andrew Zisserman Talk - Part 1b
Upcoming SlideShare
Loading in …5
×

Andrew Zisserman Talk - Part 1b

1,724 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,724
On SlideShare
0
From Embeds
0
Number of Embeds
607
Actions
Shares
0
Downloads
41
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Andrew Zisserman Talk - Part 1b

  1. 1. Demo
  2. 2. Example
  3. 3. Spatial re-ranking • improves precision • but not recall …
  4. 4. Query images Prec. Rec.• high precision at low recall (like google)• variation in performance over query• none retrieve all instances
  5. 5. Why aren’t all objects retrieved? Set of SIFT query image descriptors sparse frequency vector Hessian-Affine Clustered and regions + SIFT quantized to descriptors visual wordsObtaining visual words is like a sensor measuring the image“noise” in the measurement process means that some visual words are missing or incorrect, e.g. due to • Missed detections • Changes beyond built in invariance 1. Query expansion • Quantization effects 2. Better quantizationConsequence: Visual word in query is missing in target image
  6. 6. Query expansion
  7. 7. Query ExpansionIn text : • Reissue top N results as queries • Pseudo/blind relevance feedback • Danger of topic drift – this is a big problem for text
  8. 8. Query Expansion: Text Original query: Hubble Telescope Achievements Query expansion: Select top 20 terms from top 20 documents according to tf-idf Added terms: Telescope, hubble, space, nasa, ultraviolet, shuttle, mirror, telescopes, earth, discovery, orbit, flaw, scientists, launch, stars, universe, mirrors, light, optical, speciesExample from: Jimmy Lin, University of Maryland
  9. 9. Query ExpansionIn text : • Reissue top N results as queries • Pseudo/blind relevance feedback • Danger of topic drift – this is a big problem for textIn vision: • Reissue spatially verified image regions as queries • Spatial verification like an oracle of truth
  10. 10. Visual query expansion - overview 1. Original query 2. Initial retrieval set … 3. Spatial verification oracle4. New enhanced query 5. Additional retrieved images
  11. 11. What Query Expansion AddsQuery Image Originally retrieved image Originally not retrieved
  12. 12. What Query Expansion Adds
  13. 13. What Query Expansion Adds
  14. 14. What Query Expansion Adds
  15. 15. Visual query expansion - overview 1. Original query 2. Initial retrieval set … 3. Spatial verification oracle4. New enhanced query 5. Additional retrieved images
  16. 16. Bag of visual words particular object retrieval centroids Set of SIFT (visual words)query image descriptors sparse frequency vector Hessian-Affine visual words regions + SIFT descriptors +tf-idf weighting Inverted file querying Query Geometric ranked image expansion verification short-list [Chum & al 2007] [Lowe 04, Chum & al 2007]
  17. 17. Demo
  18. 18. Query ExpansionQuery image Originally retrieved Retrieved only after expansion
  19. 19. Original results (good) Prec.Queryimage Rec. Expanded results (better) Prec. Rec.
  20. 20. Better Quantization
  21. 21. Problems arising from quantization• Typically, quantization has a significant impact on the final performance of the system [Sivic03,Nister06,Philbin07]• Quantization errors split features that should be grouped together and confuse features that should be separated Voronoi cells
  22. 22. And more …i. Points 3 and 4 are close, but never matchedi. Points 1, 2 and 3 are matched equally
  23. 23. Overcoming quantization errors• Soft-assign each descriptor to multiple cluster centers• Assignment weight according to Gaussian on distance• Normalize weights to sum to one [Philbin et al. CVPR 2008, Van Gemert et al. ECCV 2008] B: 1.0 Hard Assignment A: 0.1 B: 0.5 Soft Assignment C: 0.4Learning a vocabulary to overcome quantization errors[Mikulik et al. ECCV 2010, Philbin et al. ECCV 2010]
  24. 24. Several other solutions are possible …e.g. Hamming embedding [Jegou&Schmid ECCV 2008]• Standard quantization using bag-of-visual-words• Additional localization in the Voronoi cell by a binary signature• More on methods of soft assignment tomorrow
  25. 25. Soft Assignment: Implementation Bag of words: score a match between two features by thescalar product of their weight vectors Spatial re-ranking: also score the number of inliers using thismeasure
  26. 26. Soft Assignment: ResultsBenefit 1: Helping Query Expansion Hard Assignment Query Only one good initial result – QE doesnt significantly improve results
  27. 27. Soft Assignment: Results Benefit 1: Helping Query Expansion Soft Assignment Query 4 good results – allows queryexpansion to return these results in addition to the ones above
  28. 28. Soft Assignment: ResultsBenefit 2: Better spatial localization Hard Assignment Soft Assignment
  29. 29. Results: Baseline to State of the Art Mean Average Precision 1. Baseline Method K = 10K 0.389 2. Large Vocabulary K=1M 0.618 3. Spatial Re-ranking 0.653 4. Soft Assignment (SA) 0.731 5. Query Expansion (QE) 0.801 6. SA & QE 0.825Disadvantages of soft assignment?
  30. 30. Outline1. Object recognition cast as nearest neighbour matching2. Object recognition cast as text retrieval3. Large scale search and improving performance4. Applications • accessing expert knowledge, data mining, inpainting, location search, large scale reconstruction, mobile apps, …5. The future and challenges
  31. 31. ApplicationAccessing expert knowledge: • Use an image query to access an annotated dataset • Search with query image retrieve annotation
  32. 32. Visual Access to Classical Art Archives Currently: 111 thousand Greek vase images
  33. 33. http://explore.clarosnet.org/XDB/ASP/clarosHome/
  34. 34. Application:Object Mining in Large Datasets
  35. 35. Objective …Automatically find and group images of same object/scene
  36. 36. Motivation Applications: • Dataset summarization • Efficient retrieval • Efficient pre-processing for automatic 3-D reconstruction (e.g. PhotoSynth)
  37. 37. Matching GraphBuild a ‘matching graph’ over all the images in the datasetEach image is a node and a link represents two imageshaving some object in commonGiven this graph structure, apply various clusteringalgorithms to group the data
  38. 38. Finding Commonly Occurring Objects Simple idea: strong spatial constraints gives a linkbetween two images Edge strength = # inliers
  39. 39. Finding Commonly Occurring Objects Use these links to build up a graph over all images in thedataset Nodes = images, edges = spatially verified matches
  40. 40. Building the Matching Graph• Use each image to query the dataset• Each query gives a list of results scored by a measure of the spatial consistency to the query• Threshold this consistency measure to determine the links in the matching graph
  41. 41. Connected ComponentsIn a collection of images of multiple disjoint objects weexpect the matching graph to also be disjointA simple first step is to take connected components of thematching graph and examine the clusters returned
  42. 42. Connected ComponentsExample: five connected components from the Oxforddataset 56 images 71 images 26 images 25 images 56 images
  43. 43. Connected Components A problem with connected components is that ‘connecting images’ can sometimes join two disjoint objects Linking ImagesCan overcome this problem by divide and merge strategy
  44. 44. Datasets Statue of Liberty dataset (37,034 images) • Crawled from Flickr by querying for ‘statue of liberty’ • Lots of images of the Statue of Liberty but also of New York and other sites Rome dataset (1,021,986 images) [1] • Again, crawled from Flickr • Contains too much stuff to mention[1] Photo tourism: Exploring photo collections in 3D, Noah Snavely, Steven M. Seitz, Richard Szeliski
  45. 45. Results: Statue of Liberty Largest cluster – 8461 images of the Statue of Liberty
  46. 46. Results: Statue of Liberty 2nd largest – 276 aerial views of New York
  47. 47. Results: Statue of Liberty 3rd largest – 80 American flags
  48. 48. Results: Statue of LibertySmaller clustersLego Statue of Liberty 59 imagesStaten Island 52 images
  49. 49. Results: Rome 18676 images 15818 images 9632 images 4869 images
  50. 50. Timings 21,339 high resolution images from Flickr tagged withstatue of liberty Querying with every image in the database to build thegraph takes ~2 hours Finding connected components (v quick) using athreshold of 20 spatially verified inliers gives 11 clusterswith more than 20 images
  51. 51. As an aside …Better matching with fewer features[Turcot & Lowe, ICCV Workshop 2009].• Build matching graph• Augment image bag-of-word histograms using neighbours• Like query expansion, but done in advance on the `server side’
  52. 52. Application: Internet-based inpaintingPhoto-editing using images of the same place[Whyte, Sivic and Zisserman, 2009], but see also [Hays and Efros, 2007].
  53. 53. Application: place recognition(retrieval in a structured (on a map) database) Query Optimized image database Image database Best match Image indexing with spatial verificationQueryExpansion(Panoramio, Flickr, … ) Confuser Suppression Only negative training data (from geotags)[Knopp, Sivic, Pajdla, ECCV 2010] http://www.di.ens.fr/willow/research/confusers/
  54. 54. Correctly recognized examples 17
  55. 55. More correctly recognized examples
  56. 56. Application: Matching and 3D reconstruction in large unstructured datasets.Building Rome in a Day, SameerAgarwal, Noah Snavely, Ian Simon, Steven M. Seitz and Richard Szeliski,International Conference on Computer Vision, 2009http://grail.cs.washington.edu/rome/See also [Havlena, Torrii, Knopp and Pajdla, CVPR 2009]. Figure: N. Snavely
  57. 57. Example of the final 3D point cloud and cameras57,845 downloaded images, 11,868 registered images. This video: 4,619 images. The Old City of Dubrovnik
  58. 58. Application: Mobile visual search apps Bing visual scanand others… Snaptell.com, Moodstocks.com
  59. 59. Example Slide credit: I. Laptev
  60. 60. Papers and DemosSivic, J. and Zisserman, A.Video Google: A Text Retrieval Approach to Object Matching in VideosProceedings of the International Conference on Computer Vision (2003)http://www.robots.ox.ac.uk/~vgg/publications/papers/sivic03.pdfDemo: http://www.robots.ox.ac.uk/~vgg/research/vgoogle/Chum, O., Philbin, J., Isard, M., Sivic, J. and Zisserman, A.Total Recall: Automatic Query Expansion with a Generative Feature Model forObject RetrievalProceedings of the International Conference on Computer Vision (2007)http://www.robots.ox.ac.uk/~vgg/publications/papers/chum07b.pdfDemo: http://www.robots.ox.ac.uk/~vgg/research/oxbuildings/Philbin, J. and Zisserman, A.Object Mining using a Matching Graph on Very Large Image CollectionsProc. of the Indian Conference on Vision, Graphics and Image Processing (2008)http://www.robots.ox.ac.uk/~vgg/publications/papers/philbin08b.pdf

×