Slides on Photosynth.net, from my MSc at Imperial

3D browsing of a photos dataset
Uncovering Photosynth.net
Markou Nikolas, Romain Dossin, Kevin Keraudren
November 29, 2010

Introduction The Bundler Pipeline Photo Explorer Rendering Running the code Conclusion
Introduction

Introduction
Flickr search ”Rome Coliseum”
34,169 results
Markou Nikolas, Romain Dossin, Kevin Keraudren () 3D browsing of a photos dataset November 29, 2010 2 / 22

Introduction
Huge amount of data on the Web: Flickr > 5 billion photos
How can we browse such amount ?
What can we learn from it ?
What if we could turn 2D into 3D ?
→ Photosynth.net
(University of Washington + Microsoft Research)

1 The Bundler Pipeline
Extract the focal length from the EXIF tags (extract focal.pl)
Find feature points in each image using SIFT
Match keypoint descriptors between each pair of images
Structure from motion : recover a set of camera parameters and a
3D location for each track
2 Photo Explorer Rendering
Render the scene
Transitions
View Interpolation
3 Running the code

Extract the focal length from the EXIF tags (extract focal.pl)
Extract the focal length from the EXIF tags
(extract focal.pl)
Jhead
ImageMagick: identify -format %[exif:*] image.jpg
focalpixels = X resolution ∗(focalmm/CCD widthmm).
→ used later to initialize the bundle adjustment

SIFT - Scale Invariant Feature Transform
From scale space to feature space
SIFT transforms an image into a large collection of local feature vectors
each of which is invariant to :
image translation
scaling
rotation
and partially invariant to :
illumination changes
afﬁne projections
3d projections

SIFT (continued)
It is based on the highly successful Gaussian pyramid and the simple to
implement Difference of Gaussians (DoG) technique.
Source: http://fourier.eng.hmc.edu
For each different level the maxima and minima points are kept and
rotation histogram is created from the pixels around those for extra
robustness.
These features can then be matched on other images. Objects can also
be described as a set of features.

Output format of ./sift
<number of keypoints> <descriptor length>
<subpixel row> <subpixel column> <scale> <orientation >
<i n v a r i a n t descriptor vector>
<subpixel row> <subpixel column> <scale> <orientation >
<i n v a r i a n t descriptor vector>
. . .

Approximate nearest neighbors to match
keypoints between each pair of images
2 images I and J, SIFT keypoints in J → kd-tree
for each keypoint in I, look for nearest neighboor in J
Source: Wikipedia

The fundamental matrix
Corresponding points within stereo-pair images are connected by the
fundamental matrix.
Set of corresponding points xi ↔ xi in two images
F is the fundamental matrix ⇐⇒ ∀i, xiFxi = 0
linear equation in the unknown entries of F:
If x = (x,y,1) , x = (x ,y ,1) then :
x xf11 +x yf12 +x f13 +y xf21 +y yf22 +y f23 +xf31 +f33 = 0

Fundamental matrix estimation using the 8 point
algorithm and RANSAC
With n point matches, this can be rewritten :
Af =



x1x1 x1y1 x1 y1x1 y1y1 y1x1 y1 1
...
xnxn xnyn xn ynxn ynyn ynxn yn 1


f = 0
where f the 9-vector made up of F in row-major order
∃ solutions ⇐⇒ rank(A) ≥ 8, unicity in the case of equality
(f determined up to scale)
If A > 8 (ex. noise): least-squares solution or run RANSAC and keep
the best ﬁtting model

RANSAC - Random Sample Consensus
During any matching procedure we are stuck with erroneous matches.
These mismatched points are called outliers and are usually
catastrophic when trying to ﬁt a model to the data.
Source: Wikipedia

RANSAC - Random Sample Consensus
Method:
Randomly choose a number points
Try to fit a model to them
Check how many other points are in consensus with the model
It is repeated and the best fit is left as a solution.
All points not fitting this solution (outliers) are usually removed from the
data set. This process filters most of the large errors.

Organize the matches into tracks
Source: ”Modeling the World from Internet Photo Collections”

Structure from motion : recover a set of camera parameters and a 3D location for each track
Structure from motion : recover a set of camera
parameters and a 3D location for each track
Start with the two cameras (images) that best match
Estimate their parameters (focal length from EXIF tags, 5 points
algorithm)
Recover the 3D position of the points they both observe through a
bundle adjustment
Then take the camera that observes the most of the same points
Estimate its parameters using Direct Linear Transformation
Run a bundle adjustment adding only the already known points :
only the new camera parameters can change
Run another bundle adjustment adding the points observed by
another camera
Iterate with a new camera

Structure from motion : recover a set of camera parameters and a 3D location for each track
Structure from motion : recover a set of camera
parameters and a 3D location for each track
Bundle adjustment :
n cameras parametrized by Θij
m tracks parametrized by the 3D points Xj
qij the observed projection of the j-th track in the i-th camera
P(Θ,X) : mapping between a 3D point X and its 2D projection in a
camera with parameters Θ
wij : 1 if camera i observes point j, 0 otherwise
Minimize (the unknowns are the 3D points Xj):
n
∑
i=1
m
∑
j=1
wij qij −P(Θi,Xj)

Render the scene
Render the scene
As frustra
Images
Points and lines
3D rendering
Sources: Wikipedia & ”Modeling the World from
Internet Photo Collections”

Transitions
Transitions
Representation accuracy of the real scene
Camera motion
Linear interpolation
Timing
Twinkle
View interpolation
Triangulated Morphs
Planar Morphs

Transitions
Triangulated Morphs
Method
Projection of the points onto each image
2D Delaunay triangulation, with edges constraints
Projection of the triangulation onto an average plane
Creation of a 3D mesh
Display depending on camera location
Rendering
Good geometry
Artifacts

Transitions
Planar Morphs
Method
Projection onto a common plane
Display depending on camera location
Rendering
Lower quality of the geometry
Less Artifacts

Running the code
One full run: Bundler → Poisson reconstruction
82 photos from Flickr, big images, 44h, 115 196 points recovered at the
Bundler stage...
Figure: Cloud points obtained from Bundler, and Poisson surface
reconstruction done after PMVS

Conclusion
Photosynth is only a beginning...
The University of North Carolina aims to reconstruct famous sites,
in a day from a ”normal machine”

Quizz
1 SIFT is used on images to ﬁnd key features. In what
transformations is SIFT invariant to and when it doesn’t perform so
well ? How does this affect the clusters generated afterwards ?
2 If you were given the parameters of a camera (rotation matrix, focal
length, position of the center), the associated image and the 3D
cloud point it observes, where would you place the 2D image in 3D
space ?
3 As Photosynth is a web application that must be able to display
high-resolution photos to a lot of people simultaneously, how do
you think Microsoft optimized this system in order to avoid large
data transfers ?

Slides on Photosynth.net, from my MSc at Imperial

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Slides on Photosynth.net, from my MSc at Imperial

Similar to Slides on Photosynth.net, from my MSc at Imperial (20)

More from Kevin Keraudren

More from Kevin Keraudren (15)

Recently uploaded

Recently uploaded (20)

Slides on Photosynth.net, from my MSc at Imperial