Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Slides on Photosynth.net, from my MSc at Imperial
1. 3D browsing of a photos dataset
Uncovering Photosynth.net
Markou Nikolas, Romain Dossin, Kevin Keraudren
November 29, 2010
2. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Introduction
3. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Introduction
Flickr search ”Rome Coliseum”
34,169 results
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 2 / 22
4. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Introduction
Huge amount of data on the Web: Flickr > 5 billion photos
How can we browse such amount ?
What can we learn from it ?
What if we could turn 2D into 3D ?
→ Photosynth.net
(University of Washington + Microsoft Research)
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 3 / 22
5. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
1 The Bundler Pipeline
Extract the focal length from the EXIF tags (extract focal.pl)
Find feature points in each image using SIFT
Match keypoint descriptors between each pair of images
Structure from motion : recover a set of camera parameters and a
3D location for each track
2 Photo Explorer Rendering
Render the scene
Transitions
View Interpolation
3 Running the code
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 4 / 22
6. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Extract the focal length from the EXIF tags (extract focal.pl)
Extract the focal length from the EXIF tags
(extract focal.pl)
Jhead
ImageMagick: identify -format %[exif:*] image.jpg
focalpixels = X resolution ∗(focalmm/CCD widthmm).
→ used later to initialize the bundle adjustment
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 5 / 22
7. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Find feature points in each image using SIFT
SIFT - Scale Invariant Feature Transform
From scale space to feature space
SIFT transforms an image into a large collection of local feature vectors
each of which is invariant to :
image translation
scaling
rotation
and partially invariant to :
illumination changes
affine projections
3d projections
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 6 / 22
8. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Find feature points in each image using SIFT
SIFT (continued)
It is based on the highly successful Gaussian pyramid and the simple to
implement Difference of Gaussians (DoG) technique.
Source: http://fourier.eng.hmc.edu
For each different level the maxima and minima points are kept and
rotation histogram is created from the pixels around those for extra
robustness.
These features can then be matched on other images. Objects can also
be described as a set of features.
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 7 / 22
9. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Find feature points in each image using SIFT
Find feature points in each image using SIFT
Output format of ./sift
<number of keypoints> <descriptor length>
<subpixel row> <subpixel column> <scale> <orientation >
<i n v a r i a n t descriptor vector>
<subpixel row> <subpixel column> <scale> <orientation >
<i n v a r i a n t descriptor vector>
. . .
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 8 / 22
10. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Match keypoint descriptors between each pair of images
Approximate nearest neighbors to match
keypoints between each pair of images
2 images I and J, SIFT keypoints in J → kd-tree
for each keypoint in I, look for nearest neighboor in J
Source: Wikipedia
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 9 / 22
11. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Match keypoint descriptors between each pair of images
Approximate nearest neighbors to match
keypoints between each pair of images
2 images I and J, SIFT keypoints in J → kd-tree
for each keypoint in I, look for nearest neighboor in J
Source: Wikipedia
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 9 / 22
12. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Match keypoint descriptors between each pair of images
Approximate nearest neighbors to match
keypoints between each pair of images
2 images I and J, SIFT keypoints in J → kd-tree
for each keypoint in I, look for nearest neighboor in J
Source: Wikipedia
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 9 / 22
13. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Match keypoint descriptors between each pair of images
Approximate nearest neighbors to match
keypoints between each pair of images
2 images I and J, SIFT keypoints in J → kd-tree
for each keypoint in I, look for nearest neighboor in J
Source: Wikipedia
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 9 / 22
14. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Match keypoint descriptors between each pair of images
Approximate nearest neighbors to match
keypoints between each pair of images
2 images I and J, SIFT keypoints in J → kd-tree
for each keypoint in I, look for nearest neighboor in J
Source: Wikipedia
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 9 / 22
15. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Match keypoint descriptors between each pair of images
Approximate nearest neighbors to match
keypoints between each pair of images
2 images I and J, SIFT keypoints in J → kd-tree
for each keypoint in I, look for nearest neighboor in J
Source: Wikipedia
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 9 / 22
16. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Match keypoint descriptors between each pair of images
The fundamental matrix
Corresponding points within stereo-pair images are connected by the
fundamental matrix.
Set of corresponding points xi ↔ xi in two images
F is the fundamental matrix ⇐⇒ ∀i, xiFxi = 0
linear equation in the unknown entries of F:
If x = (x,y,1) , x = (x ,y ,1) then :
x xf11 +x yf12 +x f13 +y xf21 +y yf22 +y f23 +xf31 +f33 = 0
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 9 / 22
17. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Match keypoint descriptors between each pair of images
Fundamental matrix estimation using the 8 point
algorithm and RANSAC
With n point matches, this can be rewritten :
Af =
x1x1 x1y1 x1 y1x1 y1y1 y1x1 y1 1
...
xnxn xnyn xn ynxn ynyn ynxn yn 1
f = 0
where f the 9-vector made up of F in row-major order
∃ solutions ⇐⇒ rank(A) ≥ 8, unicity in the case of equality
(f determined up to scale)
If A > 8 (ex. noise): least-squares solution or run RANSAC and keep
the best fitting model
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 10 / 22
18. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Match keypoint descriptors between each pair of images
RANSAC - Random Sample Consensus
During any matching procedure we are stuck with erroneous matches.
These mismatched points are called outliers and are usually
catastrophic when trying to fit a model to the data.
Source: Wikipedia
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 11 / 22
19. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Match keypoint descriptors between each pair of images
RANSAC - Random Sample Consensus
Method:
Randomly choose a number points
Try to fit a model to them
Check how many other points are in consensus with the model
It is repeated and the best fit is left as a solution.
All points not fitting this solution (outliers) are usually removed from the
data set. This process filters most of the large errors.
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 12 / 22
20. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Match keypoint descriptors between each pair of images
Organize the matches into tracks
Source: ”Modeling the World from Internet Photo Collections”
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 13 / 22
21. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Structure from motion : recover a set of camera parameters and a 3D location for each track
Structure from motion : recover a set of camera
parameters and a 3D location for each track
Start with the two cameras (images) that best match
Estimate their parameters (focal length from EXIF tags, 5 points
algorithm)
Recover the 3D position of the points they both observe through a
bundle adjustment
Then take the camera that observes the most of the same points
Estimate its parameters using Direct Linear Transformation
Run a bundle adjustment adding only the already known points :
only the new camera parameters can change
Run another bundle adjustment adding the points observed by
another camera
Iterate with a new camera
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 14 / 22
22. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Structure from motion : recover a set of camera parameters and a 3D location for each track
Structure from motion : recover a set of camera
parameters and a 3D location for each track
Bundle adjustment :
n cameras parametrized by Θij
m tracks parametrized by the 3D points Xj
qij the observed projection of the j-th track in the i-th camera
P(Θ,X) : mapping between a 3D point X and its 2D projection in a
camera with parameters Θ
wij : 1 if camera i observes point j, 0 otherwise
Minimize (the unknowns are the 3D points Xj):
n
∑
i=1
m
∑
j=1
wij qij −P(Θi,Xj)
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 15 / 22
23. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Render the scene
Render the scene
As frustra
Images
Points and lines
3D rendering
Sources: Wikipedia & ”Modeling the World from
Internet Photo Collections”
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 16 / 22
24. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Transitions
Transitions
Representation accuracy of the real scene
Camera motion
Linear interpolation
Timing
Twinkle
View interpolation
Triangulated Morphs
Planar Morphs
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 17 / 22
25. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Transitions
Triangulated Morphs
Method
Projection of the points onto each image
2D Delaunay triangulation, with edges constraints
Projection of the triangulation onto an average plane
Creation of a 3D mesh
Display depending on camera location
Rendering
Good geometry
Artifacts
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 18 / 22
26. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Transitions
Planar Morphs
Method
Projection onto a common plane
Display depending on camera location
Rendering
Lower quality of the geometry
Less Artifacts
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 19 / 22
27. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Running the code
One full run: Bundler → Poisson reconstruction
82 photos from Flickr, big images, 44h, 115 196 points recovered at the
Bundler stage...
Figure: Cloud points obtained from Bundler, and Poisson surface
reconstruction done after PMVS
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 20 / 22
28. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Conclusion
Photosynth is only a beginning...
The University of North Carolina aims to reconstruct famous sites,
in a day from a ”normal machine”
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 21 / 22
29. Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Quizz
1 SIFT is used on images to find key features. In what
transformations is SIFT invariant to and when it doesn’t perform so
well ? How does this affect the clusters generated afterwards ?
2 If you were given the parameters of a camera (rotation matrix, focal
length, position of the center), the associated image and the 3D
cloud point it observes, where would you place the 2D image in 3D
space ?
3 As Photosynth is a web application that must be able to display
high-resolution photos to a lot of people simultaneously, how do
you think Microsoft optimized this system in order to avoid large
data transfers ?
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 22 / 22