This work is on Techniques for Organization and visualization Of Community ph coll. Also known as CPC’s.
CPC’s are large image collections which are captured using a variety of cameras mobile phones and shared on photosharing websites such as flick and facebook. These images can be search back using Google image search.
A search of Golkonda Fort on Google images and Flickr returns more than 50k images, which have captured this monument from various viewpoints and under different lighting conditions.
There has been some prior work in this area by Noah Snavely et. Al. in a project called Photo Tourism. The aim of this project is to provide an interface where a user could virtually tour a popular landmark captured in a CPC. This has also given rise to a commercial product called Photosynth. (Demo?)Sattler et al. have used CPCs to find the geographics locations of query images.Goesele et al. used CPCs to do dense scene reconstructions of the scene captured in the photographs.Frahm et al. have used these Photo collection to do city scale reconstructions using millions of images, and provide an intuitive method for Exploring the CPCs by computing iconic images.// If photosynth demo is done-- As you can see here, Every image is shown in a geometic context to its neighbouring images, which gives a sense of virtually visiting this place. I can click on nearby images and a geometrically consistent transition is shown. Apart from this, I can also look at a 3d point cloud of the of the scene captured in the CPC, which gives a quick overall understanding of the scene.
I will outline the major steps required for Processing CPCs for computing Scene Recontructions in order to do Virtual Tourism and Exploration.Thefirst step is to establish feature correspondences across images. Robust features, ex SiFT, are extracted in all the images. In every pair of images, the matching features are found by comparing feature descriptions. Since this level of matching tends to be fault prone. A refinement step is employed which uses epipolar geometry constraint to verify matches. The consistent matches spanning across more than 2 images are saved as tracks. The purpose of the next step is to produce a 3D point corresponding to each of these tracks. For this a procedure known as Incremental SfM is used. SfM is a classic computer vision problem which tries to estimate the 3D structure of the scene and the relative calibration of the cameras which took the images using a small set of input images for which no prior information is available.For CPC’s an incremental version of this algorithm is used which starts by seeding the scene reconstruction with a pair of images having a large number matching feature using standard SfM pipeline. This is followed by the incremental addition of new images in batches to reconstruction, and triangulation of new points in them. A procedure known as bundle adjustment is performed everytime a new batch of images are added to the reconstruction to minimize errors.This gives rise to a full scene reconstruction in which all the image and this corresponding 3D points are registered in a single frame of reference.
This techniques proposed have some issues, most evident being the quadratic Image Matching cost which is incurred as every image is matched with every other image. The incremental SFM procedure used can be 4th power in time complexity. This technique is sensitive to the choice of seed which is used to initialize the reconstruction. Finally, Cascading of errors can happen which can lead to large sections of reconstruction getting inaccurate as shown in the figure.
Timing breakdown for various dataset has shown that the image matching and Incremental SfM dominate the overall time taken for processing CPCs. Infact, for Trafalgar Square dataset with 8k images, the procedure did not stop even after 50 days.
Our motivation comes from the unstructured nature of CPC’s since the constituent image have different resolutions, view points and lighting conditions. As a result only a very few number of images match. Keeping this in mind we make the following two contributions – First is to do exhaustive pairwise matching without incurring a quadratic time complexity.Second is to build a Visualization framework which can by pass the issues faced by Incremental SfM.
The aim of the image matching problem is to compute an image match graph, where images are shown as Nodes and a pair of matching images are shown as edges. This kind of a structure could easily facilitate queries such as connected components or Shortest path as shown in the figure.
Philbin et al. have describe a way to solve this problem using Large feature vocabularies to decrease the descriptor matching cost. This follows a standard image retrieval pipeline in which all the images are indexed into a Database. And later queried one by one to find their matches. Indexing is done by quantizing the image features using a large vocabulary of visual words, which are nothing but quantization levels in the image feature space. And inverted index is built upon these visual words to list all the image which contain a particular visual word. Querying takes place by querying, in the inverted index, all the visual words present in the query image, and giving score to the images in the posting list. A shortlisting of the top scoring matches is done and these matches are later verified using epipolar geometry constraints. Since the size of the posting lists grow with the size of indexed database, this procedure has linear time complexity for for each query image, and thus an overall quadratic time complexity.
Chum et al. describe a process of Discovering matching image in using Min Hash procedure. In this, images are treated as sets of visual words contained in them. A signature of every image is computed by employing K min Hash functions. The signatures are sampled into small sketches of size 3 or 4 and are inserted into hash table to facilitate matching. All the sketches falling into the same hash table bucket are deemed matching, and images with matching sketches are sent for geometric verification. These verified image matches act as seeds for discovering clusters of matching images, by using standard image Retrieval and Query expansion techniques. However, due to its sampling nature, this technique has a high chance of missing out small clusters.
Our solution, to this problem is to do Exhaustive Pairwise matching, by querying each image in turn. The goal we set is to have each query done in constant time, thereby ensuring an overall linear time complexity. We address the exhaustiveness by verifying all potential matches avoiding any sort of shortlisting. Since, standard geometric verification is a time consuming process we have built a verification scheme which can be evaluated directly from inverted index retrievals. Overall, our main result is that Indexing geometry allows us to meet all the above requirements!
We capture the geometric information in images by using High order features, which in our case are a concatenation of 2 nearby image features and their relative geometry. Our procedure for extracting High Order Features involves selecting a few of the features satisfying a certain scale space criteria as Primary features and then clubbing them with nearby secondary features, by computing Affine Invariants between them, as shown in the figure. A high order feature therefore is represented as a Tuple containing the Visual words for the primary and the secondary features followed by the quantized geometric parameters. The side effect is that it gives a huge feature space, which we will see is crucial to creating an index which will support fast queries.
To see how this is done, I go back to Image retrieval using inverted index. In a regular inverted index, the number of visual words is limited, therefore the posting lists grow linearly with the size of the database. With HoF we have a huge feature space which can be reprojected to any desired size and thus control the density of the posting lists.We choose the reprojected size of feature space be in proportion with the size of the database. To realize this, we define equally sized bloom filters, using only one hash function, for each of the images in the database and build an inverted index over the bit positions of the bloom filters. As a result, the size of the posting lists of the inverted index does not increase with the size of the database. Therefore, building an inverted index over this space would result in constant time queries, so as I had mentioned before, we do not need to shortlisting of potential candidates. Next, we will see how these potential matching images are geometrically verified.
Fast spatial verification is crucial to for fast querying. Our scheme achieves this by doing this directly from the index retrievals. For this, given a query, we query all the Hof related to a primary feature in succession to look for matches in the database images. If we find 4 or more such matches in a database image, then we consider that primary feature to be reliably matching the database image. Finding 2 or more such reliable primary feature matches confirms image matching.
In summary, our algorithm for computing Image match graph works as follows : We compute High order features in all database images. Then select a Reprojection size which is in proportion the database size to initialize inverted index. For indexing, the Hash value of the HoFeatures act as the Key, and the correponding value is kept as the image id. We query each image in turn and record matches in the adjacency list, which is our Image match graph.
To test the effectiveness of our approach in finding small clusters of matching images, we used the University of Kentucky benchmark dataset which has 4 images each of 2550 objects. We consider a object to have been discovered if atleast on of the matching images is correctly discovered. We were able to get 73.2 % recall, as compared to an earlier discussed Min Hash based solution which gave recall of 49.6%.
This table describes the time breakdown of our solution on the oxford 5k and 105k image datasets. We see that by using a moderate index sizes such as 32 million for 5k images, querying time per image goes down to 0.024s, bringing the total query time to 2 mins. We were able to discover 317 clusters having 1375 images in them. For a large oxford 105k dataset, we increased the index size to 500 Mn, to obtain a querying time 85ms, which was reduced to 61 ms on doubling the index size to 1Bn. This also brought the total querying time for 105k queries to under 2 hours. We 2147 clusters having 7198 images in them.
This image shows some the small clusters discovered in the oxford dataset. We also found some error which were mostly due to text and window like structures.
Our second contribution is for visualizing CPC’s as walkthroughs, while also keeping in mind that community photo collections tend to increase in size over time.
More formally we want to efficiently browse and keep incorporating an incoming stream of images. This is what our solution provides for browsing such image collections looks like – you will notice that the experience is similar to that of Photosynth, but as we will see next, we are able to bypass the issues faced by it.
Our approach is based on the observation that in awalkthrough, users primarily observe nearby overlapping images because they convey the most about the local geometry around the image. Therefore, we propose creating Independent Partial Scene Reconstructions, containing an image and its overlapping images, instead of a Global Full Scene Reconstruction. This framework provides certain distinct advantages over the previous system:We avoid sensitivity to the choice of an initial pair as one wrong choice of initial pair in one of the reconstructions will not affect the correctness of other partial reconstructions. For a similar reason we are also able avoid the cascading and compounding of errors leading to misestimating of large sections of the scene. Our framework puts a limit on the number of images involved in a partial reconstruction and thereby putting a bound on the time taken in creating a partial reconstruction. This makes our system linear in the number of images and thereby allowing it to scale to large datasets. The use of partial reconstructions also makes it easy for our system to handle new images as they become available.One drawback of this approach is that is restricts the possible number of images to which a user can transition from a given image, to only its immediate neighbors, as we shall see this is compensated by reduction in the time complexity.
To create the partial reconstructions : We start by finding for each image a set of top ‘n’ similar images using Image retrieval employing Bag of visual words representation. Next, we refine these matches using Epipolar geometry contraints. Next, we do a partial scene reconstruction corresponding to each image and its verified neighbors. We use Bundler as SfM pipeline for obtaining these partial reconstructions.
Our visualization interface allows browsing through the partial reconstructions created from the given input images in a interactive 3D virtual environment. We start with the partial reconstruction corresponding to one of the images, say I. Next, we align the virtual camera with the camera corresponding to image I and project image I on a planar approximation of the points visible to the camera. Similarly, we display the other images in the partial reconstruction as wireframes of their respective projections. Upon clicking a wireframe, we make a smooth transition from the current image to the new image. And at the end of the transition we move to the partial reconstruction corresponding to the new image and align the virtual camera to the camera corresponding to the new image. Therefore, the user is explores the scene by moving from one partial reconstruction to another while getting cues about the relationships that exist between an image and its overlapping images.
Our system makes it very easy to incorporate a new image into a existing reconstruction. For this we determine the top n matches of the new image by comparing the histogram representation of the new image to that of the existing images. Next we verify these matches by computing their epipolar geometry. Then the new image along with it verified matches is sent to Bundler for a creating partial scene reconstruction corresponding to the new image. Thus, we potentially add a new partial reconstruction for every image that is inserted into the system.
We have tested our system on a large image collection of nearly 6000 images of Golkonda Fort at Hyderabad. These images capture the fort in various illumination conditions and from several viewpoints.
We created a few subsets of various sizes from this dataset. Our system scales to large datasets of up to 6k images in nearly 5 days of CPU computation time.
This graph compares the time taken by our system to compute all the partial reconstructions with time taken by Bundler in doing a SfM to provide a Full scene reconstruction for datasets of various sizes. We see that the speed up provided by our system increases as with the number of images as our system takes nearly linear time in the number of images.
In a experiment with an image collection of 687 images of a courtyard of the Fort collected over different times of the day, we initialized with 200 images and incrementally added the 487. The final image connectivity graph had 674 images. This following walkthrough was created from the same.
Thank you very much for your kind attention.
Thank you very much for your kind attention.
Till now we have been considering all the match graphs as directed even when each of the match was commutative. This was done to ensure that only a finite number of images are involved in every partial scene reconstruction. But we can use this commutativity of the matches during the visualization to improve connectivity. For this, we note that, an edge from A to B in the visualization graph essentially means that it is possible to make a transition from Image A to Image B in a partial reconstruction corresponding to image A, denoted by P(A). If an edge from B to A is not present then we can still show a transition from B to A in P(B) by using P(A) as a proxy partial reconstruction. For this we align the virtual camera at Image B in P(A) and make a transition from B to A. Thus, our final connectivity graph used for navigation is undirected.
Transcript of "Techniques for Organization and Visualization of Community Photo Collections"
Techniques for Organization and
Visualization of Community
Faculty Advisor : Dr. C.V. Jawahar
Community Photo Collections
• Golkonda Fort (Google Images + Flickr)
– > 50 K images
Applications of CPCs
• Snavely et al. – Siggraph `06, ICCV `09
– Virtual Tourism, Visualization
• Sattler et al. – ICCV `11 , ECCV `12
• Goesele et al. – ICCV `07, VMV `11
– Dense 3D Reconstruction
• Frahm et al. – ECCV `08, ECCV `10
– City Scale Reconstructions, Exploration
Add new images and
triangulate new points
Snavely et. al, Photo Tourism: Exploring image collections in 3D
Full Scene Reconstruction
• Quadratic Image Matching cost
• Global scene reconstruction
– O(N4) in the worst case
– Sensitivity to the choice of the initial pair
– Cascading of errors
Image credits: Snavely et. al, Photo Tourism: Exploring image collections in 3D
Snavely et. al, Photo Tourism: Exploring image collections in 3D
Full Scene Reconstruction for Trafalgar Square
with 8000 images took > 50 days
• CPCs are unstructured
– Different resolutions, viewpoint , lighting
– Very limited number of images match
• Contribution 1 : Matching
– Exhaustive pairwise matching w/o quadratic cost
• Contribution 2 : Visualization
– Framework for bypassing the issues faced with
Image Matching Problem
• Compute Image Match Graph
– Images Nodes
– Image Match Edges
– Connected components
– Shortest path
Discovering Matching Images
• Object Retrieval with Large Vocabularies and
Fast Spatial Matching – Philbin et al.
• Image Retrieval
– Quantization : Image Features Visual Words(VW)
– Inverted Index : over VWs
– Filtering Shortlist of Top Scoring matches
– Verification of shortlist
• O(N) time for a single querying
Discovering Matching Images
• Large Scale Discovery of Spatially Related
Images - Chum, O. and Matas. J
Our Solution : Overview
• Exhaustive Pairwise Matching
– Query each image in turn
• Goal : O(1) per query
• Addressing Exhaustiveness
– Verify all potential matches : No shortlists
– Verification doable from Index retrievals
• Our Main Result : Indexing geometry allows both!
• High Order Features
– Combine nearby features
• Primary with Secondary Features
• Encode Affine Invariants
– HOF is a Tuple
• Huge Feature Space
Constant Time Queries using HOFs
• Regular Inverted Index
– Posting lists grow with Database size O(N)
• HOF => Huge Feature Space ( > 1012 )
– Reproject with Hash Functions!
• Use Bloom Filters
– Range α Database size
• Constant sized posting lists
• Result : Constant time queries
• Computable from index retrievals
– For a query primary feature
• Search all secondary features in database images
• Pass if R features are found.
Solution : Summary
• Extract HoF in the N database images
• Select Reprojection size as CN
• Initialize an Index of size CN
– Key : Hash value of HoF
– Value : Image Id
• Query : Each image in turn
– Record matches in adjacency list
• Result : Image Match Graph
• UK benchmark
– 2550 categories x 4 = 10400 images
– 73.2 % recall
–Large Scale Discovery of Spatially Related
Images (Min Hash based solution)
• 49.6 % recall
Oxford 5K Oxford 105K Oxford 105K
#HOF 78 Mn 1480 Mn 1480 Mn
Index Size 32 Mn 500 Mn 1 Bn
27 min 8 hours 8 hours
Query Time per
0.024 sec 0.085 sec 0.061 sec
Query Time 2 min 2.5 hours 1.8 hours
Clusters Found 317 2147 2147
Images Registered 1375 7198 7198
• Small Clusters
• Efficiently browse and keep Incorporating
incoming stream of images
Our Solution : Overview
• Observation : In a walkthrough, users primarily
see nearby overlapping images.
– Robustness to errors in incremental SfM module
– Worst case linear running time
Independent Partial Scene Reconstructions
Global Scene Reconstruction
Compute Matches Refine Matches Compute partial
User interface and navigation
Input images Verified neighbors
• Courtyard Dataset with
• Initialized with 200
• Added 487 image one
• Largest CC of 674
• Image Matching : HOFs gives a larger feature
space which can be reprojected to obtain
sparse posting lists making Exhaustive
Pairwise Matching feasible.
• CPCs Visualization : Partial scene
reconstructions can effectively be used to
navigate through large collections of images.
• QUESTIONS ?!
• Take Home Message : 2 ideas
– For information retrieval using an inverted index,
combining features gives a larger feature space
which can be reprojected to control the average
lengths of posting lists, and thus the query time.
– For a very complex algorithm O(N > 2), it may
sometimes be meaningful to fragment the dataset
into O(N) groups, each of finite size, there by
reducing the overall complexity to O(N).