Multi-hypothesis projection-based shift estimation for sweeping panorama reconstruction
MULTI-HYPOTHESIS PROJECTION-BASED SHIFT ESTIMATION FOR SWEEPING PANORAMA RECONSTRUCTION Tuan Q. Pham† Philip Cox Canon Information Systems Research Australia (CiSRA) 1 Thomas Holt drive, North Ryde, NSW 2113, Australia. † firstname.lastname@example.org ABSTRACT one that produces the highest two-dimensional (2D) Normal-Global alignment is an important step in many imaging appli- ized Cross-Correlation (NCC) score.cations for hand-held cameras. We propose a fast algorithm Various enhancements are added to the basic alignmentthat can handle large global translations in either x- or y- algorithm above to improve its performance. The input im-direction from a pan-tilt camera. The algorithm estimates the ages are subsampled prior to analysis to improve speed andtranslations in x- and y-direction separately using 1D corre- noise robustness. Shift estimation is performed over multiplelation of the absolute gradient projections along the x- and y- scales to rule out incorrect shifts due to strong correlation ofaxis. Synthetic experiments show that the proposed multiple texture at certain frequencies. When appropriate, the imagesshift hypotheses approach is robust to translations up to 90% are automatically cropped to improve overlap before gradientof the image width, whereas other projection-based alignment projection.methods can handle up to 25% only. The proposed approach Given the alignment between consecutive frames from acan also handle larger rotations than other methods. The ro- panning camera, a panoramic image can be constructed dur-bustness of the alignment to non-purely translational image ing image capturing. Overlapping images are stitched alongmotion and moving objects in the scene is demonstrated by a an irregular seam that avoids cutting through moving objects.sweeping panorama application on live images from a Canon This seam also minimizes an intensity mismatch of the twocamera with minimal user interaction. images on either side of the seam. Image blending is ﬁ- nally used to eliminate any remaining intensity mismatch af- Index Terms— shift estimation, image projection, sweep ter stitching.panorama 1.1. Literature review 1. INTRODUCTION Numerous solutions are available for translational imageGlobal alignment is an important task for many imaging ap- alignment. Amongst them, a correlation-based method isplications such as image quality measurement, video stabi- popular for its robustness. However, 2D correlation is costlylization, and moving object detection. For applications on for large images. 2D phase correlation, for example, requiresembedded devices, the alignment needs to be both accurate O(N 2 logN 2 ) computations for an N×N image using Fastand fast. Robustness against difﬁcult imaging conditions such Fourier Transform (FFT). The computational complexity canas low light, camera motion blur or motion in the scene is be reduced to O(N logN ) if the correlation is performed onalso desirable. In this paper, we describe a low-cost global 1D image projections only . This projection-based align-shift estimation algorithm that addresses these needs. The ment algorithm is suitable for images with strong gradientalgorithm’s robustness against difﬁcult imaging conditions structures along the projection axes. This assumption holdsand its real-time performance is demonstrated on a sweep- for most indoor and natural landscape scenes.ing panorama application using live images from a hand-held Adams et al. reported a real-time projection-based align-Canon camera. ment of a 320×240 viewﬁnder video stream at 30 frames per In particular, our global alignment algorithm performs second on standard smartphone hardware . Their algorithmseparable shift estimation using one-dimensional (1D) pro- uses projections of the image’s gradient energy along four di-jections of the absolute gradient images along the sampling rections. The use of image gradient rather than intensity im-axes. For each image dimension, multiple shift hypotheses proves alignment robustness against local lighting changes.are maintained to avoid misdetection due to non-purely trans- Despite their speed advantage, previous projection-basedlational motion, independent moving objects, or distractions alignment algorithms have a number of limitations. First, thefrom the non-overlapping areas. The ﬁnal shift estimate is the images must have a substantial overlap (e.g., more than 90%
Fig. 1. Simultaneous shift estimation over multiple scales.of the frame area according to ) for the alignment to work.This is because image data from non-overlapping areas cor-rupt the image projections, eventually breaking their correla-tion. Second, any deviation from a pure translation is likely tobreak the alignment. The viewﬁnder algorithm , for exam-ple, claims to handle a maximum of 1◦ rotation only. Third,previous gradient projection algorithms are not robust to lowlighting condition. The weak gradient energy of dark cur-rent noise at every pixel often overpowers the stronger butsparse gradient of the scene structures when integrated over Fig. 2. Multiple shift hypotheses from gradient projectiona whole image row or column. For a similar reason, gradi- correlation.ent projection algorithms are also not robust against highlytextured scene such as carpet or foliage. the base of the pyramid. Block summing is used to subsample the images for efﬁciency. Because block summing produces1.2. Structure of this paper slightly more aliased images compared to Gaussian subsam-In this paper, we present a fast global alignment algorithm pling, some subpixel alignment error is expected. However,with application in sweeping panorama reconstruction. Sec- the alignment error can be corrected by subpixel peak inter-tion 2 presents the new multiple hypotheses global alignment polation of the NCC score at the base pyramid level.algorithm using gradient projections. Section 3 describes asoftware prototype for sweeping panorama stitching using the 2.1. Multi-hypothesis gradient projection correlationnew alignment algorithm. Section 4 evaluates the alignmentand panorama stitching algorithms. Section 5 concludes the At each pyramid level, the translation between two input im-paper. ages I1 , I2 is estimated by a multi-hypothesis projection- based shift estimation algorithm described in Fig. 2. Im- age gradients |∂I1 /∂x| and |∂I1 /∂y| are estimated using ﬁ- 2. PROJECTION-BASED IMAGE ALIGNMENT nite difference. The magnitude of the x-gradient image is then integrated along image columns to obtain the x-gradientWe propose a projection-based shift estimation algorithm that projection: px = |∂I1 /∂x|dy. The y-gradient projec- 1is robust to large translation, small rotation and perspective tion is similarly obtained from the y-gradient image: py = 1change, noise and texture. The global shift is computed over |∂I1 /∂y|dx. The corresponding gradient projections frommultiple scales as shown in Fig. 1. The input images are ﬁrst the two images are correlated to ﬁnd multiple possible trans-subsampled to a manageable size to reduce noise and compu- lations in either dimension. Cross-correlation of zero-paddedtation. A dyadic image pyramid is then constructed for each zero-mean signals is used instead of a brute-force search forimage . At each pyramid level, a shift estimate is obtained a correlation peak in  to handle a larger range of possibleindependently using the new projection-based image align- motion. Multiple 2D shift hypotheses are derived from allment algorithm described in Section 2.1. The shift candidate combinations of the 1D shift hypotheses in both dimensions.with the highest 2D NCC score is the ﬁnal shift estimate. A 2D NCC score is obtained for each of these 2D shift hy- Aligning two images at multiple subsampled resolutions potheses from the overlapping area of the input images dic-and taking the best solution is more robust than alignment at tated by the shift. The shift hypothesis with the highest 2Da single original resolution for a number of reasons. First, NCC score is then reﬁned to a subpixel accuracy by an ellip-noise is substantially reduced by subsampling while the gra- tical paraboloid ﬁt over a 3×3 neighborhood around the 2Ddient information of the scene is largely preserved. Second, NCC peak.subsampling reduces texture variation and its contribution to Fig. 3 shows a block diagram with all steps and possiblethe gradient projections. execution paths of our multi-hypothesis projection-based shift Too much subsampling, however, eliminates useful align- estimation algorithm. The efﬁciency of the new algorithmment details. To achieve an optimal gain in signal-to-noise ra- comes from two improvements over  in steps 1 and 2:tio, we align the images over three successively halved pyra-mid levels starting from an image size around 2562 pixels at 1. The input images are subsampled to a manageable size
Frame 1 panorama from Frame 21 Frame 31 Frame 11 48 panning images (6 actually used) Frame 41 50 100 150Fig. 3. Flow chart describing the proposed projection-based 200shift estimation algorithm. 250 300 350 100 200 300 400 500 600 700 800 900 1000 1100 (e.g., 256×256 pixels) before alignment; Fig. 4. Sweeping panorama (1119 × 353) in the presence of 2. The 2D translation is estimated separately in x- and y- moving objects and perspective image motion (seams shown dimension (rather than in four orientations as in ) us- in yellow). ing projections of the images directional gradient mag- nitude (rather than the gradient energy as in ) onto robust compositing method is to segment the mosaic and use the corresponding axis. a single image per segment . For sweeping panorama, the The algorithm is robust to large translations thanks to a images undergo a translation mainly in one direction. Twonew multiple shift hypotheses algorithm in steps 3 to 6: consecutive images can therefore be joined together along a seam that minimizes the intensity mismatch between adjacent 3. For each pair of 1D projections, k shift hypotheses are segments . Laplacian pyramid fusion  can then be used selected from the k strongest 1D NCC peaks (e.g., k=5) to smooth out any remaining seam artefacts. using non-maximal suppression ; To demonstrate our alignment technology on realistic 4. Any shift candidate with a dominant 1D NCC score, scenes, we built a standalone application that stitches live im- which is higher than 1.5-time the second highest score ages from a panning camera. The images are automatically along the same dimension, is the ﬁnal shift for that di- transferred from a Canon 40D camera to a PC. A screen- mension; shot of our demo application is given in Fig. 4, where the panorama was reconstructed from six panning images in real- 5. If only one dimension has a dominant NCC score, the time. two images are cropped to an overlapping area along For efﬁciency, we do not use all captured images for this dimension before returning to step 2; panorama stitching. The images whose ﬁelds of view are 6. If there is no shift hypothesis with a dominant 1D NCC covered by neighbouring frames can be skipped to reduce score, k 2 2D shift hypotheses are constructed from the the seam computations. All incoming frames still need to be 1D shift hypotheses (see Fig. 2). The shift candidate aligned to determine their overlapping areas. The ﬁrst frame with the highest 2D NCC score is the ﬁnal 2D shift. is always used in the panorama. A frame is skipped if it over- laps more than 75% with the last used frame and if the next Note that our algorithm terminates at step 4 if two images frame also overlaps more than 25% with the last used frame.have substantial overlap. Step 5 is executed if there is a large The second condition ensures no coverage gap is created byshift in only one dimension. Step 6 is the most expensive part removing a frame. These overlapping parameters can be in-because it requires the computation of k 2 2D NCC scores. creased to encourage more frames to be used during stitching.Fortunately, for a sweeping panorama application, the motion Fig. 4 illustrates an example with this default overlapping pa-is mainly one-dimensional. As a result, most of the examples rameter where only four out of six captured frames are neededin this paper branch to step 5, which requires signiﬁcantly to construct a panoramic image.fewer 2D NCC score computations to ﬁnd the best translation. Our software prototype automatically determines the sweep direction from the alignment information. There is 3. SEAMLESS PANORAMA STITCHING no need for the user to select the direction, as required in some consumer cameras. Fig. 5b shows an example of a ver-Using the alignment algorithm described in the previous sec- tical panorama constructed by our system from ten images intion, a panning image sequence can be combined to form a Fig. 5a. The output image is a good reproduction of the scenepanoramic image. If the alignment is accurate to a subpixel despite few horizontal or vertical structures in the scene, light-level, frame averaging can be used for image composition . ing change due to camera auto-gain, and texture of carpet onHowever, subpixel alignment is difﬁcult for images captured the ﬂoor. Another example on automatic sweep direction de-by a moving camera with moving objects in the scene. A more tection can be seen in Fig. 9, where the camera was panned
Fig. 7. Estimated shifts for image pairs undergoing a syn- (a) 10 input fra mes (512×340) (b) panorama (543×1330) thetic horizontal shift.Fig. 5. Vertical sweeping panorama produced by our system. Matlab. For each available image size, an average runtime and its standard deviation are plotted as error bars in Fig. 6. Runtime varies even for the same image size due to different content overlap. A line is ﬁt to the data points to predict the runtime of each algorithm for an arbitrary image size. All al- gorithms show a linear run-time performance with respect to the number of input pixels. 2D correlation is the slowest algo- rithm. Its ﬂoating-point FFT operation also triggers an out-of- memory error for images larger than ten Mega Pixels (MP). Our algorithm runs slightly faster than that of Adams et al. because ours does not have the corner detection and match- ing steps. The red line ﬁt in Fig. 6 shows that it takes us lessFig. 6. Shift estimation run time set out against image size for than 0.05 of a second in Matlab to align a 1 MP image pairthree algorithms. and roughly 0.1 second to align an 8 MP image pair. As the image size gets larger, the major part of the run-time is spentfrom right to left instead of the traditional left to right motion on image subsampling, which can be implemented more efﬁ-as in Fig. 4. ciently in hardware using CCD binning. To measure the robustness of our projection-based align- 4. EVALUATION ment algorithm against large translation, we performed a syn- thetic shift experiment. Two 512×340 images were croppedWe ﬁrst present an evaluation of our projection-based shift es- from the panoramic image in Fig. 4 such that they are re-timation, followed by results on seamless panorama stitching. lated by a purely horizontal translation, which ranges from 1 to 500 pixels. The estimated shifts [tx ty ] are plotted in Fig. 7 for three algorithms: 2D correlation, viewﬁnder align-4.1. Shift estimation ment, and this paper s. Both 2D correlation and viewﬁnderWe compare our multi-hypothesis projection-based shift esti- alignment fail to estimate shifts larger than 128 pixels (i.e.mation algorithm against an FFT-based 2D correlation and tx > 25% of image width). Our multi-hypothesis algorithm,the viewﬁnder alignment algorithm . All three algo- on the other hand, estimates both shift components correctlyrithms were implemented in Matlab version R2010b. For the for a synthetic translation up to 456 pixels (i.e. 90% of imageviewﬁnder alignment algorithm, the images were subsampled width). As suggested by the 2D correlation subplot on the topto approximately 320×240 pixels to match the viewﬁnder res- row of Fig. 7, the strongest correlation peak does not alwaysolution in . Harris corner detection followed by nearest correspond to the true shift. Large non-overlapping areas canneighbour corner matching was used to correct for small ro- alter the correlation surface, leading to a sudden switch of thetation and scale change as described in . global peak to a different location. This sudden change in the We applied the three shift estimators to panning image global correlation peak corresponds to the sudden jumps ofpairs of different sizes and recorded the execution time in the tx and ty curves in the 2D correlation subplot.
small image rotation are further described by the RMSEs in Table 2. Within a ±1◦ rotation range, Adams et al. is the most accurate method, closely followed by this paper. Both achieve subpixel accuracy. For any larger rotation range, our algo- rithm is the most accurate. We consistently produce less than 2-pixel alignment error for rotation up to 5◦ . Adams et al., on the other hand, fail to align images with more than 3◦ rotation. 4.2. Panorama stitching We demonstrate the accuracy of our multi-hypothesis projection-based shift estimation on a sweeping panorama ap- plication. Five images on the top row of Fig. 4 come from a sequence of 48 images captured by a hand-held camera. DueFig. 8. Estimated shifts for image pairs undergoing a small to a panning motion of the camera, the input images undergo asynthetic rotation. horizontal translation mainly. The translations are calculated between consecutive image pairs using the alignment algo- rithm presented in Section 2. Six frames (1,12,22,33,43,48) The average accuracy of the estimated shifts in Fig. 7 is with sufﬁcient content overlap are automatically selected fortabulated in Table 1. We measured the Root Mean Squared panorama stitching. The selected frames are stitched togetherErrors (RMSE) of the estimated shifts within two ground- along a set of irregular seams (shown as yellow lines in thetruth translation intervals. The ﬁrst interval (1 ≤ tx ≤ 128) is panorama).where all three algorithms achieve subpixel accuracy. Within Fig. 4 demonstrates our solution s robustness to movingthis interval, the viewﬁnder alignment algorithm is the most objects and non-purely translational motion. Because the in-accurate and this paper s is the least accurate. The second in- tensity difference across the seams is minimized, the stitchedterval covers a larger range of shifts (1 ≤ tx ≤ 456) and this image appears seamless. The seams do not cut through mov-is when all other algorithms fail. Within this larger motion ing objects such as the cars on the road. However, one ofrange, our algorithm produces an average of 2-pixel align- these cars appears multiple times in the panorama as it movesment error for horizontal translation up to 90% of the image through the scene during image acquisition. Another visiblewidth. artefact is the bending of the balcony wall close to the camera. We also tested the robustness of our shift estimation al- This geometric distortion is due to the approximation of a fullgorithm against small image rotation. Fig. 8 plots the esti- 3D projective transformation of the images by a simple 2Dmated shifts by the same three alignment algorithms on purely translation. Despite these artefacts, the produced panorama isrotated image pairs. The images are generated from frame a plausible representation of the scene.1 of the image sequence in Fig. 4 by a rotation, followed Our global alignment algorithm is also robust to motionby central cropping to 276×448 pixels to remove the miss- blur. An example of a panning sequence with severe motioning image boundary. Under zero translation, the viewﬁnder blur is shown on the top row of Fig. 9. Because multiple 1Dalignment algorithm is robust up to 3◦ rotation. Outside shift hypotheses are kept, the correct 2D shifts are success-this ±3◦ rotation range, however, the viewﬁnder alignment fully detected, leading to a good panorama reconstruction onalgorithm produces unreliably large shift estimation errors. the bottom row of Fig. 9. Note that the output panorama couldNote that the middle subplot has a 10-time larger vertical axis have been improved further using motion blur deconvolution.limit compared to the other two subplots. Our algorithm per- However, deconvolution is out of the scope of this paper.forms equally well to that of Adams et al. for small rotation More panoramas reconstructed by our system are given(|θ| < 3◦ ). For larger rotation, the error of our alignment in Fig. 10. Our algorithm works well outdoors (Fig. 10a)increases only gradually, reaching 10-pixel misalignment for because motion of distant scenes can be approximated by a10◦ rotation. The performances of the three alignment algorithms under Table 2. RMSE of estimated shifts under small rotation Table 1. RMSE of estimated shifts under large translation Correlation Adams et al. This paper Correlation Adams et al. This paper −1◦ ≤ θ ≤ 1◦ 1.070 0.673 0.737 1 ≤ tx ≤ 128 0.118 0.083 0.420 −3◦ ≤ θ ≤ 3◦ 3.212 1.684 1.310 1 ≤ tx ≤ 456 278.444 279.549 2.281 −5◦ ≤ θ ≤ 5◦ 5.481 141.555 1.679
Frame 11 Frame 9 Frame 6 Frame 3 sweeping panorama from 12 panning images (9 actually used) Frame 0 (a) motion trail of a moving car200400600 500 1000 1500 2000 2500 3000 (c) over-exposed Fig. 9. Seamless panorama reconstruction under motion blur (b) ripples due to unstable sweeping motion (output size is 3456×704). Fig. 11. Some panoramas produced by a consumer camera. 5. CONCLUSION We have presented a new projection-based shift estimation algorithm using multiple shift hypothesis testing. Our shift (a) Outdoor panorama (8448×1428) from 14 images estimation algorithm is fast and it can handle large image translations in either x- or y-direction. The robustness of the algorithm in real-life situations is demonstrated using a sweeping panorama stitching application. Our alignment al- (b) 360◦ panorama (4448×496) from a PTZ camera gorithm is found to be robust against small perspective change due to camera motion. It is also robust against motion blur and moving objects in the scene. We have presented a demo application for live panorama stitching from a Canon cam- era. The panorama stitching solution comprises of a multi- (c) 180◦ panorama (4000×704) of a busy shopping centre hypothesis projection-based image alignment step, an irregu- Fig. 10. Sweeping panoramas constructed by our system. lar seam stitching step and an optional image blending step. 6. ACKNOWLEDGMENT translation. Projective distortions only appear when there is The authors would like to thank Ankit Mohan from Canon signiﬁcant depth difference in the scene. The 360◦ indoor USA R&D and Edouard Francois from Canon Research panorama in Fig. 10b, for example, shows bending of linear France for their help to improve this paper’s presentation. structures due to this perspective effect. These distortions are unavoidable for a wide-angle view because the panorama ef- 7. REFERENCES fectively lies on a cylindrical surface, whereas each input im- age lies on a different imaging plane. Finally, an 180◦ view  S. Alliney and C. Morandi, “Digital image registration using of a busy shopping centre is presented in Fig. 10c. The recon- projections,” PAMI, 8(2):222–233, 1986. structed panorama captures many people in motion, none of  A. Adams, N. Gelfand, and K. Pulli, “Viewﬁnder alignment,” them are cut by the hidden seams. Comput. Graph. Forum, 27(2):597–606, 2008. For comparison purposes, we captured some panoramic  E. H. Adelson, C. H. Anderson, J. R. Bergen, P. J. Burt, and images using a consumer camera available on the market. J. M. Ogden, “Pyramid method in image processing,” RCA Different from our technology, which stitches as few frames Eng., 29(6):33–41, 1984. as possible along some irregular seams, this camera joins as many frames as it captures along straight vertical seams. This  T. Q. Pham, “Non-maximum suppression using fewer than two comparisons per pixel,” in Proc. of ACIVS, 2010, pp. 438–451. strip-based stitching algorithm is prone to motion artefacts such as the motion trail of the car in Fig. 11a. The thin strip  H.-Y. Shum and R. Szeliski, Construction of panoramic mosaics approach is also not robust to jittered camera motion. Fig. 11b with global and local alignment, IJCV, 36(2):101–130, 2000. shows some jitter artefacts of a whiteboard and a nearby win-  J. Davis, “Mosaics of scenes with moving objects,” in Proc. of dow due to an uneven panning motion. The top drawer of the CVPR, 1998, pp. 354–360. vertical panorama in Fig. 11c also looks distorted. Our solu- tion does not suffer from jittered artefacts because the images  S. Avidan and A. Shamir, “Seam carving for content-aware are aligned in both directions before fusion. image resizing,” in Proc. of SIGGRAPH, 2007.