Final report


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Final report

  1. 1. Super-resolution on video with GPU Chien-hsin Hsueh∗ Hsing-Han Ho† Kuen-Shiou Tsai‡ CSIE, NTNU CSIE, NTU CSIE, NTU Abstract several efficient and reliable scheme for GPU which are designed to improve the performance of our super-resolution algorithm. This work introduces a practical approach for super-resolution, the process of reconstructing a high-resolution image from the low- This paper is organized as follows. In the next section we give resolution input ones. The emphasis of our work is to super-resolve a brief survey of existing work on this topic. In Section 3, we de- frames from dynamic video sequences and to improve the efficiency scribe our implementation of super-resolution algorithm, and in the by GPU. In this work, we have implemented two super-resolution following section we discuss the results obtained and compare them algorithms to reconstruct the high-resolution for different motion with different methods. Finally, in Section 5, we describe the draw- types of frame. As the quality of super-resolved images highly backs of this method as revealed and summarize our conclusions. relies on the correctness of image alignment between consecutive frames, we employ the macroblock optical flow method to accu- 2 Related work rately estimate motion between the image pair. An efficient and re- liable scheme for GPU is designed to improve the performance of our super-resolution algorithm. We also implement a video player Existing super-resolution algorithms can be roughly divided into to demonstrate our result. A number of complex and dynamic video two main categories. One is reconstruction-based algorithms while sequences are tested to demonstrate the applicability and reliability the other is learning-based algorithms. of our algorithm. Reconstruction-based Super-Resolution The base of reconstruction-based super-resolution is uniform/non-uniform Keywords: super-resolution, upsampling, image sequence, gpu, sampling theories. It assumes the original high-resolution signal cuda (image) can be well predicted from the low-resolution input samples (images). Most super-resolution algorithms fall into this category. In most cases, the enforced smoothness constraint 1 Introduction suppresses high-frequency components and hence the results are usually blurred. Regularization method can be used when the The goal of Super-Resolution (SR) methods is to recover a high scene is strongly rigid, such as the case of a binary text image. resolution image from one or more low resolution input images. Super-resolution can also be performed simultaneously in time and In the classical multi-image SR a set of low-resolution images of in space. the same scene are taken (at subpixel misalignments). Each low resolution image imposes a set of linear constraints on the unknown Several refinements have been proprosed to address the robust- high resolution intensity values. If enough low-resolution images ness issue of of super-resolution algorithms. One approach handles are available (at subpixel shifts), then the set of equations becomes the case of moving object by motion segmentation. An accurate determined and can be solved to recover the high-resolution image. motion segmentation is hence crucial. Unfortunately, accurate seg- mentation is hard to obtain in the presence of aliasing and noise. Most of the proposed super-resolution algorithms belong to Recently, a robust median estimator is used in an iterative super- reconstruction-based algorithms which are based on sampling the- resolution algorithm. orems. However, due to the constraints on the motion models of the input video sequences, it is difficult to apply reconstruction- Learning-based Super-Resolution This kind of algorithms create based algorithms. Most algorithms have either implicitly or ex- high-frequency image details by using the learned generative model plicitly assumed the image pairs are related by a global paramet- from a set of training images. Several algorithms have been pro- ric transformations, which may not be satisfied in dynamic video. posed for specific types of scene, such as faces and text. However, It is challenging to design super-resolution algorithm for arbitrary Learning-based super-resolution algorithm are awkward to handle video sequences. Video frames in general cannot be related through the dynamic real-world video sequences. global parametric transformation due to the arbitrary individual pixel movement between image pairs. Hence local motion models, such as optical flow, need to be used for image alignment. 3 Algorithm In this work, we have implemented two super-resolution algo- The input of our algorithm includes: 1) multiple low-resolution rithm to reconstruct the high-resolution for different motion types video frames, (including the target frame and its neighboring of frame. The first one is Fast and General Super-Resolution frames), 2) the desired magnification factor. The output is a high- (FGSR), which can deal with the general video with good perfor- resolution image reconstructed at the target frame. mance. As the quality of super-resolved images highly relies on the correctness of image alignment between consecutive frames, In this work, we implement two super-resolution algorithms we employ the macroblock optical flow method to accurately esti- to reconstruct the high-resolution images from several neighbor- mate motion between the image pair. The second algorithm is Fast ing frames. We refer 2 papers, [Farsiu et al. 2004] and [Jiang and Robust Super-Resolution (FRSR) to reconstruct the high res- et al. 2003] , and simplify the original algorithm. Parts of code olution image from a global-motion-based video. We also design have accelerated by cuda to improve the execution time. Both of them are reconstruction-based algorithms, one is Fast and General ∗ e-mail: Super-Resolution (FGSR) while the other is Fast and Robust Super- † Resolution (FRSR). In the following context, we will describe the ‡ implementation of each algorithm in detail.
  2. 2. 3.1 FGSR FGSR represents for Fast and General Super-resolution (FGSR), which could generate the super-resolved image from any dynamic video sequence. Before we continue to describe the detail, lets first define some notations : • x denotes a target low-resolution image • f denotes the desired high-resolution image • f (n) is the approximation of f obtained after n-th iterations. • gk denotes the k-th low-resolution image • sk denotes the result of optical flow from the low-resolution Figure 2: The execution time of cpu and gpu. Depends on the input image gk to the target image f image size, gpu can accelerate the bicubic interpolation for six to . ten times. . . • f (n) is the approximation of f obtained after n-th iterations. gk-1 • gk denotes the k-th low-resolution image fn β gk fn+1 • mk denotes the mapping from the low-resolution image gk to + the target image f gk+1 . . . Figure 1: The procedure of FGSR . fn will add the difference be- g0,k tween fn and gk , in this way, it could iteratively improve the detail of high-resolved image. Figure. 1 illustrates the basic procedure of the FGSR algorithm. It starts with an initial estimation f0 by bicubic interpolation for the high-resolution image f . After we up-sampled all g0 , k, then the optical flow process ( from the gk to the target frame f ) is carried out to obtain the simulated high-resolution images s0 , k. If the gk is aligned with f , the residual pixels of sk − fn should improve f the detail of fn . We can iteratively project the result of sk − fn to refine the approximation. The β is defined as 1 Figure 3: Median of g0 , k . each pixel value of f is estimated by β= temporaldistance + 1 the median of go , k. it represents for the reciprocal of temporal distance between the sk Median of the neighboring frames Af the beginning of the and the target frame x. β is seen as the weight of sk −fn and project algorithm, we estimated the initail guess of high-resolution image onto the fn+1 . With lower β represents for the lower alignment to by medain operator. Figure 3 illustrates the procedure of median the target frame and lower influence to increase the detail of f . operator. First, we need to align all neighboring frames to map In this work, we accelerate the bicubic interpolation by gpu. the target frame. As proved by Zhao etal. [Zhao and Sawhney By parallelize the interpolation, it can relieve the execution time of 2002] an accurate alignment is the key to success of reconstruction- the algorithm for six to ten times (depends on the image size). In based super-resolution algorithms. We employ the macro-block op- Figure. 2, we can find that the cpu time increases linearly and the tical flow algorithm in our work. Second, for each pixel of f0 , we gpu time maintains a constant execution time. Parallel processing choose the median of all g0,k to be the pixel values. The initial es- potential of this part, which significantly increases the overall speed timated high-resolution image tends to be blurred, so the next step of execution. we should deblur it to enhance the detail. Bilateral Non Iterative Artifact Removal , we add a 3.2 FRSR non-iterative outlier removal step, after data fusion and, before deblurring-interpolation step using the bilateral filter. Our refine- FRSR represents for Fast and Robust Super-resolution(FRSR), ment method essentially calculates the correlation of different mea- which is specific to the global-motion-based video frames. The fol- surements (pixels from different frames) with each other and re- lowing notation are used with the following meanings in FRSR : moves the inconsistent data. The computed correlation is based • x denotes a target low-resolution image on the bilateral idea, so the high-frequency (edge-information) data will be differentiated from outliers. We assign a weight to each • f denotes the estimated high-resolution image pixel in the measurements based on its bilateral correlation with
  3. 3. corresponding pixels in the data-fused image. After computing and recorded on Intel CoreI5-750 2.66 GHz CPU and 2 GB mem- these weights, pixels with very small weights will be removed from ory and nVidia N240 1 GB video memory of GPU. the data set. As pixels containing high-frequency information re- ceive higher weights than the ones located in the low-frequency We first compare our first method, FGSR, with naive Bilinear areas, it is reasonable to compute and compare the penalty weights interpolation in Figure. 4(c). The textitBook example (Figure 4(a)) for blocks of pixels rather than for single pixels. shows the target frame in the video clip with panning motion. To generate the result, four neighboring low-resolution frames plus the Robust Regularization Super-resolution is an ill-posed prob- target frame are used. The displacement between the consecutive lem [Nguyen et al. 2001] [Tekalp 1995]. For the underdetermined frames is almost 10 pixels in somes cases. The super-resolved im- cases, there exist an infinite number of solutions. The solution for age is magnified two times, i.e. 98x114 in resolution. Result from square and overdetermined cases is not stable, which means small bilinear interpolation exhibits blocky and artifact (see Figure. 4(c) amount of noise in measurements will result in large perturbations ) when comparing with our result. Next, we use our second SR in the final solution. Therefore, considering regularization in super- method, FRSR to deal with the same target frame. In Figure. 4(e), resolution algorithm as a means for picking a stable solution is very we can find that the result generated by FRSR is slightly better than useful, if not necessary. Also, regularization can help the algorithm that of FGSR and bicubic interpolation one . to remove artifacts from the final answer and improve the rate of convergence. Of the many possible regularization terms, we desire In the Bottle example, we compare this two method with the one which results in HR images with sharp edges and is easy to ground truth image. The low-resolution input with frames are sim- implement. ulated by down-sampling the original frames (original resolution: 256x192) to 128x96. The ground truth of target frame is shown in One of the most widely referenced regularization cost functions Figure. 5(a). We blow up part of the image to highlight the differ- is the Tikhonov cost function [Elad and Feuer 1999]: ence between images by bilinear interpolation (Figure. 5(b)), FGSR (Figure. 5(c)), FRSR(Figure. 5(d)). Experimental result (Table. 1) γT (X) = ∥ΓX∥2 2 shows that image generated by FGSR algorithm outperforms that of bilinear interpolation by 0.1927dB in terms of peak signal-to- where Γ is usually a high-pass operator such as derivative, Lapla- noise ratio (PSNR). And the result generated by FRSR algorithm cian, or even identity matrix. The intuition behind this regulariza- also outperforms that of bilinear interpolation by 0.7007dB and of tion method is to limit the total energy of the image (when Γ is FGSR algorithm by 0.508dB. the identity matrix) or forcing spatial smoothness (for derivative or Laplacian choices of Γ). As the noisy and edge pixels both con- tain high-frequency energy, they will be removed in the regular- ization process and the resulting denoised image will not contain sharp edges. Certain types of regularization cost functions work efficiently for some special types of images but are not suitable for general images (such as maximum entropy regularization which (a)Target frame produce in sharp reconstructions of point objects, such as star fields in astronomical images [Gibson and Bovik 2000]). One of the most successful regularization methods for denoising and deblur- ring is the Total Variation (TV) method [Rudin et al. 1992]. The total variation criterion penalizes the total amount of change in the image as measured by the L1 norm of the gradient and is defined as: γT V (X) = ∥ ▽ X∥1 (b) (c) where ▽ is the gradient operator. The most useful property of total variation criterion is that it tends to preserve edges in the recon- struction [Gibson and Bovik 2000] [Rudin et al. 1992] [Chan et al. 2001], as it does not severely penalize steep local gradients. Based on the spirit of total variation criterion, and bilateral filter, this reg- ularizer called Bilateral-TV, which is computationally cheap to im- plement, and preserves edges. The regularizing function looks like, (d) (e)) ∑∑ P P γBT V (X) = α m+l ∥X − l m Sx Sy X∥1 Figure 4: The Text example. In this experiment, the low-resolution l=0 m=0 input frames are simulated by down-sampling the original frames (b) (original resolution: 98x114) to 49x57. One target frame of the l m where matrices (operators) Sx , and Sy shift X by l, and k pixels down-sampled version is shown in (1). We magnifiy the target frame in horizontal and vertical directions repectively, presenting several two times and compare with the result of bilinear interpolation (c), scales of derivatives. The scalar weight α, 0 < α < 1, is applied to FGSR (d), and FRSR (e). give a spatially decaying effect to the summation of the regulariza- tion term. We also implement a video player (see Figure. 6), which can play the video sequence in real-time. User can zoom in/out the 4 Result frame to change the resolution on the selected area, and switch the upsampling algorithm to compare the result. However, we haven’t To verify our algorithm, we tested it with two video clips, namely integrated all our super-resolution algorithm into it for the lack Text (98 x 114, 30 fps, Figure 3), Bottle(128 x 96, 30 fps, Fig- of time. So far, you just can switch the mode between nearest- ure. 5(a) ),. All experiments and timing statistics are carried out neighbor, bilinear, and bicubic.
  4. 4. Our implementation can treat mild motion blur and spatially varying blur in real-world video clips. Efficiently remove the noisy and aliasing and generate high-resolution image. However, severe blurring needs more efforts. Table 1: A comparision of PSNR between different super-resolution algorithm Image Bilinear FGSR FRSR Bottle 14.4501 dB 14.6428 dB 15.1508 dB Text 13.5860 dB 13.2931 dB 13.3693 dB (a)Ground truth (a)Nearest-neighbor (b)Bilinear interpolation (b)Bicubic interpolation Figure 6: The video player. User can zoom in/out the frame and switch the upsampling algorithm to compare the result. 5 Conclusion In this work, we implement two practical super-resolution algorithms that is capable of reconstructing high-resolution im- ages from complex and dynamic video sequences, which may con- (c)FGSR algorithm result tain mild motion blur. By integrating the super-resolution algo- rithm with GPU into the iterative reconstruction process, the super- resolved images are generated in a short period of time. Two mild and dynamic video sequences are tested to demonstrate the appli- cability of this two algorithms. The performance of our algorithm depends on the varying of global parametric transformations. To further improve the speed performance, it would be pos- sible the find a new algorithm avoid the iterative calculation. Be- cause of the dependency of each iteration, it’s hard to accelerate by gpu. Another direction to reconstruct the high-resolution image by learning-based algorithm. In learning-based algorithm, it needs to model correlations in image structure over extended neighbor- hoods. The modeling complexity can be reduced remarkably if we construct the prior model on images patches instead of full-size im- (d))FRSR algorithm result ages, which can be easily parralelize on gpu. Figure 5: Comparison with ground truth (a). In this experiment, References we magnify the target frame of Bottle two times in both dimensions using bilinear interpolation(b) and the result of FGSR (c) , and the result of FRSR (c). C HAN , T., O SHER , S., AND S HEN , J. 2001. The digital tv filter and nonlinear denoising. Image Processing, IEEE Transactions on 10, 2 (feb), 231 –241.
  5. 5. E LAD , M., AND F EUER , A. 1999. Super-resolution reconstruc- tion of continuous image sequences. In Image Processing, 1999. ICIP 99. Proceedings. 1999 International Conference on, vol. 3, 459 –463 vol.3. FARSIU , S., ROBINSON , M., E LAD , M., AND M ILANFAR , P. 2004. Fast and robust multiframe super resolution. Image Pro- cessing, IEEE Transactions on 13, 10 (oct.), 1327 –1344. G IBSON , J. D., AND B OVIK , A., Eds. 2000. Handbook of Image and Video Processing. Academic Press, Inc., Orlando, FL, USA. J IANG , Z., W ONG , T.-T., AND BAO , H. 2003. Practical super- resolution from dynamic video sequences. In Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Com- puter Society Conference on, vol. 2, II–549 – II–554 vol.2. N GUYEN , N., M ILANFAR , P., AND G OLUB , G. 2001. A com- putationally efficient superresolution image reconstruction algo- rithm. Image Processing, IEEE Transactions on 10, 4 (apr), 573 –583. RUDIN , L. I., O SHER , S., AND FATEMI , E. 1992. Nonlinear total variation based noise removal algorithms. In Proceedings of the eleventh annual international conference of the Center for Non- linear Studies on Experimental mathematics : computational is- sues in nonlinear science, Elsevier North-Holland, Inc., Amster- dam, The Netherlands, The Netherlands, 259–268. T EKALP, A. M. 1995. Digital video processing. Prentice-Hall, Inc., Upper Saddle River, NJ, USA. Z HAO , W., AND S AWHNEY, H. S. 2002. Is super-resolution with optical flow feasible? In ECCV ’02: Proceedings of the 7th Eu- ropean Conference on Computer Vision-Part I, Springer-Verlag, London, UK, 599–613.