Calakli 3DIMPVT2012


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Calakli 3DIMPVT2012

  1. 1. High Resolution Surface Reconstruction from Multi-view Aerial Imagery Fatih Calakli∗ , Ali O. Ulusoy∗ , Maria I. Restrepo∗ , Gabriel Taubin and Joseph L. Mundy School of Engineering, Brown University, Providence, RI (USA) {fatih_calakli, ali_ulusoy, maria_restrepo, gabriel_taubin, joseph_mundy} Abstract—This paper presents a novel framework for surface agery of challenging large scale urban scenes. The ambi-reconstruction from multi-view aerial imagery of large scale ur- guities caused by moving objects, reflective surfaces, areasban scenes, which combines probabilistic volumetric modeling of constant appearance, and self-occlusions, among others,with smooth signed distance surface estimation, to produce verydetailed and accurate surfaces. Using a continuous probabilistic are modeled explicitly using probabilistic volumetric scenevolumetric model which allows for explicit representation modeling. The approach allows for a representation that isof ambiguities caused by moving objects, reflective surfaces, dense and general, where no assumptions are made aboutareas of constant appearance, and self-occlusions, the algo- the underlying geometry of the scene, such as piecewiserithm learns the geometry and appearance of a scene from planarity. A continuous formulation is used, allowing fora calibrated image sequence. An online implementation ofBayesian learning precess in GPUs significantly reduces the various adaptive discretizations of space. The space is finelytime required to process a large number of images. The discretized near the estimated surfaces, and coarsely inprobabilistic volumetric model of occupancy is subsequently regions of empty space, resulting in significant reductionused to estimate a smooth approximation of the signed distance in terms of storage and processing time costs. Once allfunction to the surface. This step, which reduces to the solution images have been used to learn the scene in an unconstrainedof a sparse linear system, is very efficient and scalable to largedata sets. The proposed algorithm is shown to produce high manner, the proposed algorithm uses a novel variational ap-quality surfaces in challenging aerial scenes where previous proach to reconstruct a surface that best represents the imagemethods make large errors in surface localization. The gen- data summarized in the probabilistic volumetric model. Aneral applicability of the algorithm beyond aerial imagery is overview of the proposed approach is shown in Fig. 1.confirmed against the Middlebury benchmark. The effectiveness of the proposed framework is validated Keywords-computational geometry, object modeling; stereo through experiments on various large scale urban; online Bayesian learning; octrees; optimization; Results indicate extracted surfaces contain high resolution detail while staying smooth in areas of ambiguity. Compar- I. I NTRODUCTION isons show that prior methods can not resolve as much detail Automated estimation of the geometry and appearance and tend to over-smooth important features. Also, the gen-of a scene from multiple images is an important research eral applicability of the proposed framework is tested in theproblem with a wide range of applications, including real- Middlebury benchmark [26]. Results are highly competitiveistic modeling for the feature film production, mapping and in terms of accuracy and completeness scores.gaming industries, quantitative measures for urban planning, II. R ELATED W ORKautonomous navigation, as well as various surveillance tasks.The problem of image-based 3-d modeling or multi-view In recent years, multi-view stereo (MVS) has seen muchstereo has been widely studied in the field of computer attention and a plethora of algorithms have been, computer graphics and computational photography. Please refer to [27] for a review and taxonomy of MVSWhile many multi-vew stereo methods resolve with high algorithms. In particular, the Middlebury benchmark [26]accuracy the surface geometry for isolated and unoccluded has fueled the development of algorithms tailored for smallobjects, only a few methods are scalable to realistic, clut- objects under controlled environments, where the top per-tered urban scenes where accurate modeling is difficult forming methods are capable of challenging the accuracydue to severe occlusions, highly reflective surfaces, varying of laser scanners. However, most of these approaches areillumination conditions, misregistration errors and sensor not suitable for dealing with high resolution aerial imagerynoise. On the other hand, most scalable 3-d reconstruction of large scale urban scenes. In particular, algorithms thattechniques have demonstrated results that are sufficient for partially rely on shape-from-silhouette techniques [13], [31],visualization purposes, but do not offer a clear solution to [16], under the assumption that a single object is visible andthe problem of accurate surface reconstruction. can be segmented from the background, are not applicable This paper presents a framework targeted to solve the to aerial imagery due to unavailability of fully visiblesurface estimation problem from high resolution aerial im- silhouettes. The Patch-based Multi-view Stereo (PMVS) algorithm ∗ equal contribution proposed by Furukawa and Ponce [10] is considered state
  2. 2. Input Images Probabilistic Volumetric Modeling Surface Reconstruction Figure 1. An overview of the proposed system.of the art amongst methods based on feature extraction learning high resolution probabilistic volumetric models ofand matching. It produces colored oriented point clouds, large urban scenes efficiently where one pass of the onlineand performs well for small objects as well as for large update takes approximately one second.urban scenes, and can recover significant geometry in such It is not clear how to estimate surfaces from probabilisticscenes (see Fig. 4 top right image). However, methods based volumetric model. A simple approach is to compute theon feature extraction and matching often show significant isosurface associated with a certain probability threshold [9],errors caused by severe occlusions, highly reflective surfaces [20], [7]. Since a different threshold may be necessary inand areas of constant appearance. Such phenomena are different regions of space, this method often fails to pro-commonplace in urban scenes, resulting in the generated duce satisfactory results, and the lack of surface orientationpoint clouds often containing gross errors in localization and information may result in “double-sided” surfaces. Yezzi etsignificant regions where feature points are not detected at all al. [32] propose a surface evolution approach to obtain(see Fig. 4 middle right and bottom right images). Surface smooth surfaces. Also related are global optimization meth-reconstruction algorithms, such as those reviewed in [15], ods such as graph cuts to extract surfaces from a volumetric[8], [2] are typically applied to the resulting point clouds. photo-consistency function [12], [19], [18]. While graph cutsAlthough these methods produce excellent quality surfaces allows for flexible energy functions and exact solutions, thewhen applied to clean and accurate data, they often produce method is not applicable to large scenes due to its very highinaccurate surfaces when the input point cloud contains too memory requirements [18].many inaccuracies and outliers (see Fig. 5 first column). The proposed surface reconstruction method extends the Smooth Signed Distance (SSD) approach introduced by Probabilistic models have been proposed as an alternative Calakli and Taubin [5]. Because of its continuous volumet-approach to handle complicated scene geometry. These ric formulation, the SSD approach is particularly comple-models do not make hard decisions about surface geometry mentary to the probabilistic volumetric methods describedand/or appearance; instead they explicitly represent uncer- above. SSD estimates a smooth approximation to the signedtainties by assigning probabilities to multiple hypothesis distance function to output a surface from an oriented pointwithin the volume. Early works along this line [4], [3] cloud, using adaptive octree-based discretizations whichcan be regarded as extensions of Space Carving [17], and reduce the problem to the solution of a sparse system ofmore recently, algorithms based on generative models for the linear equations. The resulting algorithm is efficient andreverse image formation process have been introduced [20], scalable to large data sets.[11]. Using Bayesian inference, these algorithms infer the This paper uses a continuous probabilistic volumetricmaximum a-posteriori probabilities in the volume from the model (CPVM) to explicitly represent ambiguity in bothjoint probability of all the images. Pollard and Mundy [23] surface geometry and appearance [6] and to learn largepropose an online alternative, which in theory, can handle an scale urban scenes from high resolution aerial imagery ininfinite number of images. However, none of these volumet- an efficient manner [21]. Once the model is learned, surfaceric methods implemented with regular grid discretizations reconstruction is performed by fully utilizing uncertaintiesgracefully scale up to large scenes because of the cubic in geometry in the entire volume using a novel extensionspace and time complexity. Crispell et al. [6], [7] addressed of the SSD approach, hereby referred to as GSSD. Finally,these limitations with a continuous formulation of Pollard appearance (already present) in the CPVM is transferred toand Mundy’s method implemented in an octree. Moreover, the surface estimate. This transfer is not only computation-a GPU implementation presented in [21] is capable of ally cheap but also avoids the difficulties faced by texture
  3. 3. mapping algorithms such as photometric discontinuities, the intersection of the ray and ith cell along the ray. Thei.e. seams between face boundaries, ghosting effects and starting location and length of each interval are denoted byblurring [1]. si and li respectively. Fig. 2b depicts an illustration of this ray reasoning for the octree discretization. III. P ROPOSED F RAMEWORK The segment length occlusion probability is defined asA. Continuous Probabilistic Volumetric Modeling the probability that a segment starting at si of length li is This framework uses a scalable volumetric approach to occluding and can be expressed as:model uncertainties in geometry and appearance, as pro- vis(si + li )posed by [6]. Crispell et al. propose a variable-resolution P (Qlii ) = 1 − s = 1 − e−αi li (3) vis(si )model based on a continuous density that removes thedependency on regular-sized grids of previous volumetric Equation (2) can now be written in terms of P (Qlii ) i.e. smodels [20], [23]. Although no constraints are imposedabout the type of subdivision, an octree is used in prac- pN (IN +1 |Qlii ) s P (Qlii |IN +1 ) = P N (Qlii ) s stice to approximate the underlying continuous quantities as pN (IN +1 )piecewise constant, achieving several orders of magnitude prei + vis(si )pi (IN +1 ) = P N (Qlii ) s (4)of storage savings. pre∞ + vis∞ p∞ (IN +1 ) Surface probabilities are closely related to a scalar func-tion termed the occlusion density α(x). The occlusion den- i−1sity at a point is a measure of the likelihood that the point prei ≡ P N (Qlii )vis(sj )pi (IN +1 ) s (5)occludes points behind it along any line of sight, assuming j=0that the point itself is not occluded. Points along a ray M −1(e.g. the line of sight ) may be parametrized by a distance s vis∞ ≡ [1 − P N (Qlii )] s (6)from q (e.g. camera center) as x(s) = q + sr s ≥ 0. The i=0visibility probability (1) of a point x(s) is related to the The term prei accounts for the probability of observingintegration of occlusion density from q to q + sr (see [6] for the given intensity IN +1 taking into account all segmentsderivation). Intuitively, the visibility along a ray will drop between the camera center and segment i − 1. The termsignificantly when it hits a point of high occlusion, i.e. a vis∞ measures the probability of a ray passing unoccludedsurface. An example of this relationship is shown in Fig 2a. through the model; in such cases the observed appearance s can be thought of as the appearance of “background”, which vis(s) = e− 0 α(t)dt (1) is modeled by the density p∞ . The new update equations can Learning the occlusion density from images follows an be used to update α(x) directly using (3).online Bayesian learning algorithm similar to that proposed Equation (4) has a simple interpretation, the occlusionby Pollard and Mundy [23]. In [23], the probability of a density of a cell increases if the appearance model at thevoxel being part of a surface, P (X ∈ S), is updated with cell, explains the intensity observed in the image betterthe intensity, I, observed in the pixel associated with a than any other cell along the ray or by the background.corresponding projection ray. Following Pollard’s reasoning, The appearance at each cell is modeled with a Gaussianfor a discrete set of voxels, the surface update equation is mixture distribution that is updated as in [23] using an on-expressed as: line approach based on Stauffer and Grimson’s background P N (IN +1 |X ∈ S) modeling algorithm [29]. P (X ∈ S|IN +1 ) = P N (X ∈ S) , (2) P N (IN +1 ) B. Generalized Smooth Signed Distance Surface Recon- Nwhere the conditional P (IN +1 |X ∈ S) and marginal struction (GSSD)P N (IN +1 ) are computed using the surface probabilities and In this section, the original formulation of SSD [5], whichappearance models stored in the voxels along the corre- reconstructs surfaces from oriented point cloud data, is firstsponding projection ray. summarized. Then, this formulation is generalized, targeted The definitions proposed by Crispell et al. allow for a to reconstruct surfaces from probabilistic volumes.generalization of (2). In [23], information along the projec- Given an oriented point cloud D =tion ray is approximated by a series of voxels that intersect {(x1 , n1 ), . . . , (xN , nN )}, where xi is a surface locationthe ray, with no regard to the ray/voxel intersection geom- sample, and ni is the corresponding surface normal sampleetry. However, in Crispell’s model, the occlusion density is oriented towards outside of the object, SSD reconstructsdefined continuously and therefore, the model can account a watertight surface S defined by an implicit equationfor the exact geometry of ray/voxel intersection, i.e. the S = {x : f (x) = 0}, where f : V → R is a signedlength of intersection segment. A ray is partitioned into a distance field defined on a bounded volumetric domain Vseries of M intervals, where the ith interval is the result of contained in Euclidean three dimensional space, so that
  4. 4. Figure 2. (a) Plots of occlusion density α and vis as a ray travels through volume. α peaks (and visibility drops) as the ray pierces the two walls ofthe (almost) empty cube. (b) A parametrization of the ray into intervals according to octree cell intersections. (c) Primal vertex indices associated with anoctree leaf.f (xi ) ≈ 0, f (xi ) ≈ ni , for i = 1, . . . , N . The resulting large). Note that this formulation also allows for singulardistance field is f (x) < 0 inside and f (x) > 0 outside of measures, or generalized functions as densities. In particular,the object. Since the points xi are regarded as samples of the original SSD formulation (7) can be regarded as a specialthe surface S, and the normal vectors ni as samples of the case of the continuous formulation (8), wheresurface normal at the corresponding points, the distance Nfield should ideally satisfy f (xi ) = 0 and f (xi ) = ni for 1 1 dµ(x) = δ(x − xi )dx and dσ(x) = dx (9)all the points i = 1, . . . , N in the data set. These conditions i=1 N |V |are satisfied in the least squares sense by minimizing the or more generally with a weight µi = µ(xi ) assigned tofollowing energy: each point xi . N N 1 λ1 2 2 In this analysis, f is restricted to belong to a finite Q(f ) = f (xi ) + f (xi ) − ni dimensional vector space of functions: N i=1 N i=1 λ2 2 f (x) = fω φω (x) = Φ(x)T F, (10) + Hf (x) dx, (7) |V | V ω∈Ωwhere {λ1 , λ2 } are positive constants used to control the where ω denotes an index which belongs to a finite setweights of the different terms, Hf (x) is the Hessian matrix Ω, say with K elements, φω (x) is a basis function, andof f , and the norm of the matrix is the Frobenius matrix fω is the corresponding coefficient. Then, the energy func-norm. The integral is over the volume V and |V | = V dx tion E(f ) results in a non-homogeneous quadratic functionis the measure of this volume. The first two data terms of the F t A F − 2 bt F + c in the K-dimensional parameter vectorenergy function force the implicit function to approximate F = (fω )ω∈Ω . The matrix A is symmetric and positive def-the signed distance function to the underlying surface. The inite, and the resulting minimization problem has a uniquethird regularization term forces the gradient of the function minimum. The global minimum is determined by solvingto be close to constant away from the data points. the system of linear equations A F = b. The coefficients of To generalize SSD, the energy function Q(f ) is modified matrix A and vector b requires computing inner products ofby replacing the finite oriented point cloud with a continuous every pair of basis functions. Depending on how large thedistribution of oriented points. The proposed energy function support of chosen basis functions, large number of basisis functions may overlap at any given point in the volume V . In effect, accumulating the coefficients of the matrixE(f ) = f 2 dµ + λ1 f − n 2 dµ + λ2 Hf 2 dσ, A and the vector b may require significant computation. V V V (8) An octree based finite-element/finite differences scheme iswhere dµ(x) and dσ(x) are finite measures (in the sense presented in [5] for the minimization of Q(f ) of (7), whereof measure theory), and n(x) is a vector field defined in the problem still reduces to the solution of linear equationsthe volume. When these measures are defined by non- A F = b, but the matrix A is much sparser, resulting in anegative continuous densities µ(x) and σ(x) (i.e. when fast and space-efficient algorithm.dµ(x) = µ(x)dx and dσ(x) = σ(x)dx), the values of σ(x) This discretization is particularly attractive as CPVMand µ(x) should be chosen to make sure that the first two estimates surface density µ(x) and normal vector field n(x)data terms dominate near high probability areas (i.e. µ(x) on an octree representation, i.e. µ(x) is a piecewise constantshould be large and σ(x) small), and the regularization term function, one value per octree leaf, and similarly n(x) is adominates near low probability areas (µ(x) small and σ(x) piecewise constant function, one vector per octree leaf. The
  5. 5. parameter vector F has K elements, one value per vertex of Mesh painting: Since CPVM produces color informationthe primal graph of the octree. represented as a volume texture at high resolution, estimating The primal graph represents the signed distance function surface colors reduces to volume texture evaluation, andf , and the dual graph of the octree (which has the octree they are represented as polygon mesh vertex attributes. Thisleaf centroids as vertices) represents the surface density µ. approach avoids most of the pitfalls of texture mapping{β0 , β1 , . . . , β7 } ⊆ Ω denotes the primal vertex indices from images where occlusion and extreme warping produceassociated with dual vertex (i.e. an octree leaf) β, as depicted unsuitable textures [1].in Fig. 2c. IV. I MPLEMENTATION Note that f (x) is a piecewise trilinear function, i.e. itsvalue at a point x in the volume V is determined through Scene Learning: The probabilistic geometry and appear- 7f (x) = h=0 wh fβh where w0 , . . . , w7 are the trilinear co- ance of all scenes is learned using the GPU implementationordinates of x in leaf β. By taking the piecewise definitions of CPVM as proposed in [21].of f and µ into account, the integrals of energy function Visibility information: The visibility information plays anE(f ) of (8) conveniently become finite sums over the octree important role during online Bayesian learning. Surfaces areleaves: resolved with increasing accuracy as new views become available. The appearance distributions in empty cells out- 1 λ1 side of objects fail to explain the intensity of backgroundE(f ) ≈ f (xβ )2 µβ + f (xβ ) − nβ 2 µβ object, causing α to converge to very low (ideal) values on W1 W1 β β empty (but visible) space. On the other hand, α converges λ2 to infinity (ideal) at surface locations. As the value of the + f (xβ ) − f (xγ ) 2 σβγ , (11) W2 occlusion density increases near surfaces, cells located inside (β,γ) of objects are updated with decreasing weight. Hence, afterwhere xβ is the centroid location of leaf β, µβ is the surface convergence of the model, the information inside objectsdensity at leaf β, nβ is the oriented normal vector associated could be meaningless due to ambiguous regions that werewith leaf β. Note that µβ , and nβ are estimated by CPVM, learned during early stages of the on-line training process.and kept fixed during surface estimation. The signed distance The erroneous information inside objects can potentiallyfunction f and its gradient f are written using the elements hinder the accuracy of surface extraction. These cells canof the parameter vector F : be detected using the visibility information already present 1 in the model. For each cell, a measure of its visibility, f (xβ ) = (fβ0 + fβ1 + fβ2 + fβ3 + fβ4 + fβ5 + fβ6 + fβ7 ), 8   vis score(x), is computed using a number of viewing fβ4 −fβ0 +fβ5 −fβ1 +fβ6 −fβ2 +fβ7 −fβ3 directions [7]. Cells with low vis score(x) are detected 1  f (xβ ) = fβ2 −fβ0 +fβ3 −fβ1 +fβ6 −fβ4 +fβ7 −fβ5  ,and the corresponding α is set to 0 to eliminate online 4∆β fβ1 −fβ0 +fβ3 −fβ2 +fβ5 −fβ4 +fβ7 −fβ6 learning artifacts. In practice, the viewing directions can be (12) selected from the cameras used during learning or simplywhere ∆β the side length of leaf β. The third energy term by defining a canonical set of directions, e.g. samples fromis a sum over the pair of octree leaves, β and γ that share a unit sphere. For scenes with poorly distributed cameras,a common face. The associated smoothing density σβγ is the former method might be preferred to avoid possiblydetermined naturally from the octree subdivision, i.e. σβγ = unreliable visibility computation due to poorly resolved Aβγ surfaces.∆βγ , where Aβγ is the area of the common face and ∆βγ is Normal estimation: Surface normals are computed usingthe Euclidean distance between the centroids of these leaves. the gradient information of the occlusion density α. TheW1 = β µβ and W2 = (β,γ) σβγ are normalization gradient direction is computed via convolving first orderfactors so that µβ ’s sum up to one and σβγ ’s sum up to one. derivatives of the three-dimensional gaussian kernel with the Iterative Multigrid Solver: The discretization described volume. For all experiments, six oriented kernels were usedabove reduces to the solution of a sparse linear system and their responses interpolated into an estimate of α.AF = b, which is solved using a cascading multi-grid Closed objects in the scenes contain mostly empty space,method, along with a Jacobi preconditioned conjugate gra- therefore when computing surface normals from the gradientdient solver. The problem is first solved on a much lower direction, there is an orientation ambiguity. In order to havedepth of the octree than desired, then the solution obtained surface normals oriented consistently towards the outside ofat a given depth is interpolated to the next depth; and used objects, the normal directions are oriented to the hemisphereto initialize the iterative solver. that yields the maximum visibility. Polygonization: Once the signed distance function f (x) is Surface density: The proposed surface density is theestimated, a polygonal approximation (mesh) of the isolevel followingzero is constructed using the Dual Marching Cubes (DMC)algorithm [25]. µ(x) = α(x) × vis score(x) × α(x) . (13)
  6. 6. in approximately 2 hours for “Downtown” and 3 hours for “SciLi” and both contain roughly 30 million leaf cells. (a) (b) (c)Figure 3. Sample frames and camera locations: (a) Middleburydataset [26], “Full” set of cameras in blue, “Ring” in red; (b) “Downtown”site; (c) “SciLi” site.The desired surface should pass through volume of highocclusion density α(x) and high visibility vis score(x).Moreover, α(x) is included to increase robustness tooutliers such as isolated cells (in the air) with high occlusiondensity. Figure 4. Point cloud visualization of “Downtown” (from left to right column wise): point cloud using the top (13) 70% of CPVM samples; point cloud using the top (13) 10% of CPVM samples; PMVS point cloud. V. E XPERIMENTS AND E VALUATION R ESULTS A series of experiments are conducted to evaluate the Urban Scenes: This section begins with an examinationproposed framework in three aspects. First, it is examined of the quality of the surface density learned in the well the probabilistic learning estimates the underlying In order to make comparisons to PMVS easy to visualizegeometry. Then, the quality of the reconstructed surfaces and fair, locations (octree leaf cell centroids) are sampledusing GSSD is assessed. Finally, comparisons to other from CPVM to generate a point cloud. The cell locations areMVS reconstruction methods are presented. PMVS [10] filtered using the surface density (13) as a threshold criteria.with Poisson Reconstruction [15] (PMVS+Poisson) is used The point clouds are visualized in Fig. 4, where the fullas a baseline method because it achieves high ranking in “Downtown” scene as well as details are shown for the topthe Middlebury benchmark [26] and is applicable to large 70% and 10% of the sampled cells, together with the PMVSscale scenes. The reader is referred to the supplementary point cloud. Comparisons in Fig. 4 demonstrate three clearmaterial for further comparisons. The results focus on the advantages of CPVM over PMVS. Namely, (a) both systemsreconstruction of 3D models from aerial imagery where both do well on planar textured surfaces, however in regionsthe underlying geometry and the reconstructed surfaces are of high curvatures, CPVM produces much more accurateshown to be superior to PMVS+Poisson. In addition, numer- results than PMVS, e.g. edges of buildings in bottom row;ical accuracy of the proposed algorithm is demonstrated on (b) CPVM is able to resolve details at higher resolution thanthe Middlebury benchmark where highly competitive results PMVS, e.g. pillars and tip of the building in middle row; (c)are obtained. information is very dense in the probabilistic model, e.g. the The dataset consists of images from two challenging urban point cloud of CVPM even with 10% of the surface voxelssites and two benchmark models. Fig. 3a presents a sample is denser than PMVS.image from the Temple and Dino objects that are part of the Fig. 5 presents the comparisons of surfaces reconstructedMiddlebury dataset [26]. Both objects were reconstructed for both the “SciLi” and the “Downtown” sites usingusing both the “Full” set of cameras, (312 views for Temple, PMVS+Poisson and the proposed framework. The figure363 views for Dino), and the “Ring” set (47 views for demonstrates that the proposed method produces pleasingTemple, 48 view for Dino). Fig 3b and 3c correspond surfaces while staying faithful to the high detail informationto images from two urban sites in Providence, RI, USA available in the data, notice the sharpness of building walls(publicly available in [24]). The urban sites, here referred to and corners, and ability to capture small details such as roof-as “Downtown” and “SciLi”, cover an area of approximately top pipes and tiny windows, as shown in bottom row. The(500m)×(500m) and have an approximate resolution of 30 produced surfaces are well defined and crisp, even in regionscm/pixel. The camera paths follow a circle around each site; of high surface curvature, as seen in second row. All thesethe size of all images is 1280x720 pixels, 174 views are qualities are a direct result of the variational formulation ofavailable for “Downtown” and 337 for “SciLi”. The camera equation 11. The first two data terms make sure that thematrices for all aerial image sequences were obtained using surface pass through the highly surface-like regions whilethe Bundler software [28]. The resulting models were trained the third term forces certain smoothness constraints. It has
  7. 7. Figure 5. Reconstructed surfaces from the “Downtown” and “SciLi” sites, using PMVS+Poisson (leftmost column) and the proposed framework (midcolumn). Sample images for the corresponding scenes (rightmost column).Figure 6. Renderings of 3D models (colored mesh) of the “SciLi” (left) and the “Downtown” (right) sites reconstructed by the proposed framework. Notethe high resolution texture on buildings.been observed that when PMVS and CPVM have difficulties completeness(%). Table I presents results for a selection ofmodeling the appearance and underlying geometry of highly methods obtained from [26]. In [7], Crispell et al. recon-specular surfaces, the reconstruction algorithms can not struct a surface by extracting the isosurface of the implicitrecover accurate surfaces (see supplementary video1 for an function vis score using Marching Cubes. The superiorityexample). of GSSD compared to this method is clear in both Temple Fig. 6 shows renderings of the colored meshes. Notice and Dino (Ring) datasets. Also included are the results forthe high resolution and completeness of the appearance PMVS+Poisson obtained in [10]. At the time of writing, thisinformation such as chimney and cone of the highlighted method achieved some of the best results at [26] includinghouse (in purple), as well as texture detail on buildings (in top scores in Full Dino. The proposed framework performsgreen). competitively, especially in the Full Temple dataset. It should All meshes were obtained using GSSD parameters λ1 = 1 be noted that most of the best performing algorithms re-and λ2 = 4 (see (11)). The running time for GSSD was ported at [26], including [10], take advantage of the factroughly 12 minutes using an Intel Xeon @2.9 Ghz. Both that reasonable silhouettes can be extracted. This fact in noturban scenes contained approximately a million cells with exploited in this paper, as the main focus is on applicabilitysignificantly high µ. to realistic urban scenes. Middlebury Evaluation Performance: Quantitative eval- VI. C ONCLUSION AND F UTURE W ORKuations reported2 in [26] show that the proposed methodis highly competitive both in terms of accuracy(mm) and This paper presented a framework targeted to solve sur- face reconstruction for aerial imagery of large scale urban 1 scenes. The main contribution has been the introduction 2 submission under “Generalized-SSD”. of Generalized-SSD, capable of extracting highly detailed
  8. 8. Table I Q UANTITATIVE EVALUATIONS TAKEN FROM [26]. T HE FIRST VALUE [7] D. E. Crispell. A Continuous Probabilistic Scene Model for CORRESPONDS TO ACCURACY ( MM ) AND THE SECOND TO Aerial Imagery. PhD thesis, School of Engineering, Brown COMPLETENESS (%). A LL MESHES WERE OBTAINED USING GSSD University, 2010. 2, 5, 7, 8 PARAMETERS λ1 = 1 AND λ2 = 1 ( SEE (11)) [8] T. Dey. Curve and Surface Reconstruction: Algorithms with Temple Dino Mathematical Analysis. Cambridge Univ Pr, 2007. 2 Full Ring Full Ring [9] J. S. Franco and E. Boyer. Fusion of multiview silhouette PMVS+Poisson [10] 0.54 99.3 0.55 99.1 0.32 99.9 0.33 99.6 Proposed Framework 0.53 99.4 0.81 95.8 0.55 98.1 0.6 96.0 cues using a space occupancy grid. In ICCV, 2005. 2 CPVM + Marching Cubes[7] - - 1.89 92.1 - - 2.61 91.4 [10] Y. Furukawa and J. Ponce. Accurate, dense, and robust multi- view stereopsis. In CVPR, 2007. 1, 6, 7, 8 [11] P. Gargallo, P. Sturm, and S. Pujades. An Occupancy-Depth Generative Model of Multi-View Images. In ACCV, 2007. 2surfaces from probabilistic volumes. The advantages of [12] C. Hern´ ndez, G. Vogiatzis, and R. Cipolla. Probabilistic athe proposed framework were shown via comparisons to visibility for multi-view stereo. In CVPR, 2007. 2 [13] C. Hernandez Esteban and F. Schmitt. Silhouette and stereostate of the art methods, namely PMVS [10] and Poisson fusion for 3D object modeling. In CVIU, pages 367–392,surface reconstruction [15], chosen due to their popularity 2004. 1and software availability. Comparisons to other related work [14] V. Hiep, R. Keriven, P. Labatut, and J.-P. Pons. Towards high-such as [14], [30], [22] will be performed in the near resolution large-scale multi-view stereo. In CVPR, 2009. 8 [15] M. Kazhdan, M. Bolitho, and H. Hoppe. Poisson surfacefuture. In addition to experiments in aerial scenes, the reconstruction. In Eurographics SGP, 2006. 2, 6, 8framework was also tested in the Middlebury evaluation [26] [16] K. Kolev and D. Cremers. Integration of Multiview Stereowhere it performed competitively in terms of accuracy and and Silhouettes Via Convex Functionals on Convex Domains.completeness. Most remarkably, it achieved results on par In ECCV, Oct. 2008. 1 [17] K. N. Kutulakos and S. M. Seitz. A theory of shape by spacewith algorithms that exploit high resolution silhouettes. carving. In ICCV, 1999. 2 An important future direction is to consider more sophis- [18] V. Lempitsky and Y. Boykov. Global Optimization for Shapeticated appearance models that can explain specular reflec- Fitting. In CVPR, June 2007. 2 [19] V. Lempitsky, Y. Boykov, and D. Ivanov. Oriented Visibilitytion, which would result in more robust surface estimates. for Multiview Reconstruction. LNCS. Springer, 2006. 2Currently, a difficulty inherited from the CPVM is dealing [20] S. Liu and D. B. Cooper. A complete statistical inverse raywith voxels inside of surfaces which contain inaccurate tracing approach to multi-view stereo. In CVPR, 2011. 2, 3information from early stages of the update process. Such [21] A. Miller, V. Jain, and J. L. Mundy. Real-time rendering and dynamic updating of 3-d volumetric data. In Workshop onvoxels are removed as a post-processing step but ideally, GPGPU, 2011. 2, 5they will be handled during the online Bayesian learning. [22] P. Mucke, R. Klowsky, and M. Goesele. Surface reconstruc-Finally, application of GSSD to other probabilistic volumet- tion from multi-resolution sample points. In VMV, 2011. 8ric models, such as those found in medical imaging, will be [23] T. Pollard and J. L. Mundy. Change Detection in a 3-d World. In CVPR, 2007. 2, 3investigated. [24] M. I. Restrepo, B. A. Mayer, and J. L. Mundy. Object recog- The software implementation of GSSD used to create nition in probabilistic 3d volumetric scenes. In ICPRAM,figures shown in this paper is available for download from 2012. 6 [25] S. Schaefer and J. Warren. Dual marching cubes: Primal contouring of dual grids. Computer Graphics Forum, 24, ACKNOWLEDGEMENT 2005. 5 [26] S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and This material describes work supported by the National R. Szeliski. Multi-view stereo evaluation. http://vision.Science Foundation under Grants No. IIS-0808718, and, 2011. 1, 6, 7, 8CCF-0915661. [27] S. M. Seitz, J. Diebel, D. Scharstein, and R. Szeliski. A Com- parison and Evaluation of Multi-View Stereo Reconstruction R EFERENCES Algorithms. In CVPR, 2006. 1 [28] N. Snavely and S. M. Seitz. Photo Tourism: Exploring Photo [1] A. Baumberg. Blending images for texturing 3d models. In Collections in 3D. ACM Transactions on Graphics, 2006. 6 BMVC, 2002. 3, 5 [29] C. Stauffer and W. Grimson. Adaptive background mixture [2] M. Berger, J. Levine, L. Nonato, G. Taubin, and C. Silva. An models for real-time tracking. In CVPR, 1998. 3 End-to-End Framework for Evaluating Surface Reconstruc- [30] E. Tola, C. Strecha, and P. Fua. Efficient large-scale multi- tion. Sci technical report, University of Utah, 2011. 2 view stereo for ultra high-resolution image sets. Machine [3] R. Bhotika, D. Fleet, and K. Kutulakos. A probabilistic theory Vision and Applications, 27, 2011. 8 of occupancy and emptiness. In ECCV, 2002. 2 [31] G. Vogiatzis, C. Hern´ ndez, P. H. S. Torr, and R. Cipolla. a [4] A. Broadhurst, T. W. Drummond, and R. Cipolla. A Proba- Multiview Stereo via Volumetric Graph-Cuts and Occlusion bilistic Framework for Space Carving. In ICCV, 2001. 2 Robust Photo-consistency. In PAMI, Dec. 2007. 1 [5] F. Calakli and G. Taubin. SSD: Smooth Signed Distance [32] A. Yezzi, G. Slabaugh, R. Cipolla, and R. Schafer. A surface Surface Reconstruction. Computer Graphics Forum, 30(7), evolution approach of probabilistic space carving. In 3DPVT, 2011. 2, 3, 4 2002. 2 [6] D. Crispell, J. Mundy, and G. Taubin. A Variable-Resolution Probabilistic Three-Dimensional Model for Change Detec- tion. In IEEE Transactions on Geoscience and Remote Sensing, 2012. 2, 3