1. Building Rome in a day.
July 16, 2018
Presenter: yasnyan
Summarizer: zhao
2. Basic data
Published in
"Building Rome in a day." Communications of
the ACM 54.10 (2011): 105-112.
The original version of this paper was published in
IEEE International Conference on Computer Vision.
"Building rome in a day." Computer Vision, 2009 IEEE
12th International Conference on. IEEE, 2009.
2
Authors
From:
Google, Microsoft,University of
Washington and more.
Sameer Agarwala , Yasutaka Furukawaa ,
Noah Snavely, Ian Simonb , Brian Curless,
Steven M. Seitz, and Richard Szeliski
5. Challenges & To be solved Problems
How much of the city of
Rome can be reconstructed
in 3D from this photo
collection?
Is it possible to do it in a day?
The photos of Rome on Flickr represent an ideal
data set for 3D modeling research, as they capture
the highlights of the city in exquisite detail and
from a broad range of viewpoints.
5
Problems:
» Unstructured
The shooting locations are various and in no
order.
» Uncalibrated
The shooting settings of many pictures are
different and unknown.
» Enormous
There are much more and more data to
handle than existing research.
» Must be fast !
The algorithm must be fast to do this in a
day.
6. SfM:Structure from motion Approach # 1
How to restructure objects in
3D from 2D photos.
» By checking where an arbitrary point
of the object appears in the picture, it
is possible to know the point taken
and the shooting angle.
» Reprojection error
» The author solved this problem as an
optimization problem that minimizes
the total squared it.
6
7. The correspondence problem Approach # 2
Problem:
How can we estimate correspondences
between input images automatically?
» Shooting environments may be
different.
» Even in different places there is a
common feature.
Approach:
» SIFT (Scale-Invariant Feature
Transform):
Famous feature detectors.
» ANN (Approximate Nearest Neighbor):
A fast algorithm for Nearest Neighbor Search
using k-d tree.
7
Please check the relevant article for details!!!!!
和田俊和. "最近傍探索の理論とアルゴリズム ." 研究報告コンピュータビジョンとイメージメディア
(CVIM) 2009.13 (2009): 1-12.
8. City scale matching Approach # 3
Problem:
How do we know the correspondence of
hundreds of thousands photos about the
whole city?
Approach:
» Match graph :
Graph showing correspondence of images.
It can be made by connecting images which have
sufficient number of feature or whole image
similarity.
» A multi-round scheme :
A method of proposing images to be
connected in multiple stages.
It is possible to derive of similarity of all
images without pairing calculation.
This method is consist of two elements.
» Vocabulary tree based whole
image similarity
» Query expansion
8
Agarwal, Sameer, et al. "Building rome in a day." Computer Vision, 2009 IEEE
12th International Conference on. IEEE, 2009.
9. Distributed implementation Approach # 4
The system runs with clusters of
computers, one node designated as the
master node, responsible for job
scheduling decisions.
Whole city images matching can be
divided into three phases.
1. pre processing
2. verification
3. track generation
9
http://d.hatena.ne.jp/LM-7/20100124/1264370927
10. » pre processing
Distributed implementation Approach # 4
» verification
10
» track generation
1. Each node generate local
track with local CC.
2. Master node solves
conflict and instruct each
node to merge.
SIFT & Graphing
Shared storage
Automatically load
balancing Multi-round verify
Query Expansion x4
CC:Connected Component
11. City scale SfM Approach # 5
Once the tracks are generated, the
next step is to use a SfM
algorithm (see p.6) on each CC of
the match graph to recover the
camera poses and a 3D position
for every track.
11
12. Experiments
The author reported the results of Dubrovnik, Rome, and Venice from photos from flickr.com.
Rome : ⭕(21 h) Dubrovnik: ⭕ (22.5 h) Venice : ❌ (65 h)
Data set of Dubrovnik is less than others, but with its complex visibility and widely varying
viewpoints, reconstructing Dubrovnik is more complicated.
12
13. Output result # 1
Colosseum
In Rome
This model was
constructed from
2,106 images and
819,242 3D points.
13
Colosseum https://youtu.be/kxtQqYLRaSQ
14. Output result # 2
Trevi
Fountain
In Rome
This model was
constructed from
1,936 images and
656,699 3D points.
14
Trevi Fountain https://youtu.be/Mc8ZWk2jguo
15. Output result # 3
San Marco
In Venice
This model was
constructed from
13,699 images and
4.5 million 3D
points.
15
San Marco Square https://youtu.be/HrgHFDPJHXo
16. Output result # 4
The Old City
of Dubrovnik
This model was
constructed from
4,619 images and
3.5 million 3D
points.
16
The Old City of Dubrovnik https://youtu.be/sQegEro5Bfo @YouTubeさんから
17. MVS
Recover dense and accurate models
3D points by SfM are usually sparse but,
utilizing MVS algorithm can recover dense
and accurate models.
» MVS :Multi View Stereo
MVS algorithms recover 3D geometric
information much in the same way our
visual system perceives depth by fusing
two views.
17
19. Conclusion
By using the SfM method and
the distributed computing
system, the author found that
a 3D model of the whole city
can be reconstructed from
images uploaded to the web.
The author could build Rome in a day, but
could not build Venice…
While the MVS method can recover
smoother 3D models, it requires
more computation time than the SfM
method.
❌Building Rome in a day. (with MVS)
⭕Building Colosseum in a day.
(with MVS)
19