2. first-order regression f
Xo + V'FT(yO)(Y- Yo) ..
low-freq. band Y
Y = bicubic(X,)
Fig. I. For each patch y of the upsampled low-frequency image Y, we find
its in-place match YO from the low-frequency image Yo, and then perform
a first-order regression on Xo to estimate the desired patch x for the target
image X.
the same spatial dimension as Xo, but is missing the high
frequency content, and likewise for Y and X. Let Xo and x
denote a x a HR image patches sampled from Xo and X,
respectively, and let Yo and y denote a x a LR image patches
sampled from Yo and Y, respectively. Let (i,j) and (p, q)
denote coordinates in the 2-D image plane.
A. Proposed Super-Resolution Algorithm
The LR image is denoted as Xo E lRK,XK2, from which
we obtain its low-frequency image Yo E lRK,XK2 by Gaussian
filtering. We upsample Xo using bicubic interpolation by a
factor of r to get Y E lRTK,XTK2. Y is used to approximate
the low-frequency component of the unknown HR image X E
lRTK,XTK2. We aim to estimate X from the knowledge of
Xo,Yo and Y .
Fig. 1 is a block-diagram description of the overall SR
scheme presented. For each image patch y from the image Y at
location (i,j), we find its in-place self-similar example patch
Yo around its corresponding coordinates (is,js) in the image
Yo, where is = lilr + 0.5J and js = Ulr + 0.5J. Similarly,
we can obtain the image patch Xo from image Xo, which is a
HR version of Yo. The image patch pair {Yo,xo} constitutes
a LR/HR image prior example pair from which we learn a
first-order regression model to estimate the HR image patch x
for the LR patch y. We repeat the procedure using overlapping
patches of image Y, and the final HR image X is generated by
aggregating all the HR image patches x obtained. For large
upscaling factors, the algorithm is run iteratively, each time
with a constant scaling factor r.
B. Local Regression
The patch-based single image SR problem can be viewed
as a regression problem, i.e., finding a nonlinear mapping
function f from the LR patch space to the target HR patch
space. However, due to the ill-posed nature of the inverse
problem at hand, learning this nonlinear mapping function
requires good image priors and proper regularization. From
Section II-A, the in-place self-similar example patch pair
{Yo,xo} serves as a good prior example pair for inferring
the HR version of y. Assuming that the mapping function f
is continuously differentiable, we have the following Taylor
series expansion:
x f(y) = f(yo + y - Yo) (1)
f(yo) + "ilF(yo)(y - Yo) + O{II y - Yo lin
:::::; Xo + "ilF(yo)(y - Yo).
Equation (1) is a first-order approximation for the nonlinear
mapping function f. Instead of learning the mapping function
f, we can learn its gradient "ilf, which should be simpler.
We learn the mapping gradient "ilf by building a dictionary
using the prior example pair {Yo,xo} detailed in the next
section. With the function values learned, given any LR input
patch y, we first search its in-place self-similar example patch
pair {Yo,xo}, then find "ilf(yo) using the trained dictionary,
and then use the first-order approximation to compute the HR
image patch x.
Due to the discrete resampling process in downsampling
and upsampling, we expect to find multiple approximate in
place examples for y in the 3 x 3 neighborhood of (is,js),
which contains 9 patches. To reduce the regression variance,
we perform regression on each of them and combine the results
by a weighted average. Given the in-place self-similar example
patch pairs {YO,XO}Y=l for y, we have
9
x = L (XOi + "ilF(yo,)(y - Yo,)) Wi, (2)
i=l
where Wi = (liz) . exp {- II y - YOi II§ 120"2} with z the
normalization factor.
C. Dictionary Learning
The proposed dictionary-based method to learn the mapping
gradient "ilf is a modification of the work by Yang et al.
[15], [16] to guarantee detail enhancement. Yang et al. [15],
[16] developed a method for single image SR based on sparse
modeling. This method utilizes an overcomplete dictionary
Dh E lRnxK built using the HR image, which is an n x K
matrix whose K columns represents K "atoms" of size n,
where an "atom" is a sparse coefficient vector (i.e., a vector
of weights/coefficients in the sparse basis). We assume that
any patch x E lRn in the HR image X can be represented as
a sparse linear combination of the atoms of Dh as follows:
x:::::; DhQ, with II Q 110« K, Q E lRK. (3)
A patch y in the observed LR image can be represented using
a corresponding LR dictionary Dl with the same sparse coeffi
cient vector Q. This is ensured by co-training the dictionary Dh
with the HR patches and dictionary Dl with the corresponding
LR patches.
For a given input LR image patch y we determine the sparse
solution vector
Q* = min II GDIQ - Gy II� +A II Q 111a
(4)
where G is a feature extraction operator to emphasize high
frequency detail. We use the following set of I-D filters:
9
,
= [-1,0,1]' 9
2
= 9;, 93 = [-1,-2,1], 94 = 9� (5)
122
3. G is obtained as a concatenation of the responses from
applying the above I-D filters to the image. The sparsity of
the solution vector a* is controlled by A. In order to enhance
the texture details while suppressing noise and other artifacts,
we need to adapt the number of non-zero coefficients in the
solution vector a*, as increasing the number of non-zero
coefficients enhances the texture details but also enhances the
noise and artifacts. We use the standard deviation ((J') of a
patch to indicate the local texture content, and empirically
adapted A as follows:
{ 0.5 if (J' < 15
A = 0.1 if 15 � (J' � 25
0.01 otherwise
These (J' thresholds are designed for our 8-bit gray-scale
images and can easily be adapted for other image types.
The mapping gradient 7f for a given Yo is obtained as
7f(yo) = Dha*.
We make use of a bilateral filter as a degradation operator
instead of a Gaussian blurring operator to obtain the image
Yo from the given LR input image Xo for dictionary training,
as we are interested in enhancing the textures present while
suppressing noise and other artifacts. Dictionary training starts
by sampling in-place self-similar example image patch pairs
{Yo,XO}�l from the corresponding LR and HR images. We
generate the HR patch vector Xh = {xolO X02,• • • ,Xo=}, LR
patch feature vector Yi = {Yo" Yo2,• • • ,xo=} and residue
patch vector E = {xo, - Yo" x02 - YOz,'..,XOm - YOm}' We
use the residue patch vector E instead of the HR patch vector
Xh for training. The residue patch vector is concatenated with
the LR patch features, and a concatenated dictionary is defined
by
(6)
Here, Nand M are dimensions of LR and HR image patches
in vector form. Optimized dictionaries are computed by
II Xc - DcZ II� +A II Z 111
s.t. II DCi II�� 1, i = 1, ... , K
(7)
The trammg process is performed in an iterative manner,
alternating between optimizing Z and Dc using the technique
in [15].
III. EXPERIMENTS AND RESULTS
We evaluate the proposed SR algorithm both quantitatively
and qualitatively, on a variety of example images used in the
SR literature [17]. We compare our SR algorithm with recent
algorithms proposed by Glasner et al. [5], Yang et al. [18] and
Freedman et al. [3].We used open source implementations of
these three SR algorithms available online for comparison,
carefully choosing the various parameters within each method
for a fair comparison.
123
TABLE I
PREDICTION RMSE FOR ONE UPSCALlNG STEP (2x)
Images Bicubic
Glasner Yang Freedman
Ours
[5] [18] [3]
Chip 6.03 5.81 5.70 5.85 4.63
Child 7.47 6.74 7.06 6.51 5.92
Peppers 9.11 8.97 9.10 8.72 7.74
House 10.37 10.41 10.16 9.62 8.14
Cameraman 11.61 10.93 11.81 10.64 8.97
Lena 13.31 12.92 12.65 11.97 11.41
Barbara 14.93 14.24 13.92 13.23 12.22
Monarch 16.25 15.71 15.96 15.50 15.42
A. Algorithm Parameter Settings
We chose the image patch size as a = 5 and the iterative
scaling factor as r = 2 in all of our experiments. Bicubic
interpolation on the input LR image Xo generates the low
frequency component Y of the target HR image X. A standard
deviation of 0.4 is used in the low-pass Gaussian filtering to
obtain the low-frequency component Yo of the input LR image
Xo. For clean images, we use the nearest neighbor in-place
example for regression, whereas in the case of noisy images,
we average all the 9 in-place example regressions for robust
estimation, where (J' is the only tuning parameter needed to
compute the weight Wi in (2) depending on the noise level.
K = 512 atoms are used to train and build the dictionaries
Dh and Dl used in the experiments.
B. Quantitative Results
In order to obtain an objective measure of performance
for the SR algorithms under comparison, we validated the
results of several example images taken from [10] (whose
names appear in Table 1) using the root mean square error
(RMSE). The results of all the algorithms are shown in Table
1 for one upscaling step (2 x). From Table 1 we observe
that SR using simple bicubic interpolation performs the worst
due to the assumption of overly smooth image priors. Yang's
SR algorithm performs better than bicubic interpolation in
terms of RMSE values for the different images. Glasner's
and Freedman's SR methods have very similar RMSE values,
since both the methods are closely related by using local self
similar patches to learn the HR image patches from a single
LR image. The proposed SR algorithm has the best RMSE
values, as it combines the advantages of in-place example
patches and their corresponding local self similarity learned
using the dictionary-based approach.
C. Qualitative Results
Real applications requiring SR rely on three main aspects:
image sharpness, image naturalness (affected by visual arti
facts) and the speed of the algorithm to super-resolve. We
will discuss the SR algorithms compared here with respect
to these aspects. Fig. 2 shows the SR results of the different
approaches on "child" by 4 x, "cameraman" by 3 x and on
"castle" by 2 x. As shown, Glasner's and Freedman's SR
algorithms give rise to overly sharp images, resulting in visual
artifacts, e.g., ghosting and ringing artifacts around the eyes
4. Original Bicubic Glasner Yang Freedman Ours
Fig. 2. Super-resolution results on "child" (4x), "cameraman" (3x) and "castle" (2x). Results are better viewed in zoomed mode.
in "child", and jagged artifacts along the towers in "castle" .
Also, the details of the camera are smudged in "cameraman"
for both algorithms. The results of the Yang's SR algorithm
are generally a little blurry and they contain small visible
noise-like artifacts across the images upon a closer look.
In comparison, our algorithm is able to recover the local
texture details as well as sharp edges without sacrificing the
naturalness of the images.
IV. CONCLUSION
In this paper we propose a robust first-order regression
model for single-image SR based on local self-similarity
within the image. Our approach combines the advantages
of learning from in-place examples and learning from local
self-similar patches within the same image using a trained
dictionary. The in-place examples allow us to learn a local
regression function for the otherwise ill-posed mapping from
LR to HR image patches. On the other hand, by learning
from local self-similar patches elsewhere within the image,
the regression model can overcome the problem of insufficient
number of in-place examples. By conducting various experi
ments and comparing with existing algorithms, we show that
our new approach is more accurate and can produce more
natural looking results with sharp details by suppressing the
noisy artifacts present within the images.
REFERENCES
[I] S. Dai, M. Han, W. Xu, Y, Wu, Y. Gong, and A. K. Katsaggelos,
"SoftCuts: a soft edge smoothness prior for color image super-resolution,"
IEEE Trans. Image. Process., vol. 18,no. 5,pp. 969-981,May 2009.
[2] R. Fattal, "Image upsampling via imposed edge statistics;' ACM Trans
actions on Graphics, vol. 26,no. 3,pp. 95-1-95-8,Jul. 2007.
[3] G. Freedman and R. Fallal, "Image and video upscaling from local self
examples," ACM Transactions on Graphics, vol. 30,no. 2,pp. 12-1-12-
II,Apr. 2011.
124
[4] W. T. Freeman, T. R. Jones, and E. C. Pasztor, "Example-based super
resolution," IEEE Comput. Graph. Appl., vol. 22,no. 2,pp. 56-65,Mar.
2002.
[5] D. Glasner, S. Bagon, and M. Irani, "Super-resolution from a single
image;' in Proc. IEEE Int. Con! Computer Vision, pp. 349-356,2009.
[6] H. He and W-c. Siu, "Single image super-resolution using Gaussian
process regression," in Proc. IEEE C0I1f Computer Vision and Pattern
Recognition, pp. 449-456,2011.
[7] K. I. Kim and Y. Kwon, "Single-image super-resolution using sparse
regression and natural image prior," IEEE Trans. Pattern. Anal. Mach.
buell., vol. 32,no. 6,pp. 1127-1133,Jun. 2010.
[8] X. Li and M. T. Orchard, "New edge-directed interpolation," IEEE Trans.
Image. Process., vol. 10,no. 10,pp. 1521-1527,Oct. 2001.
[9] S. Mallat and G. Yu, "Super-resolution with sparse mixing estimators,"
IEEE Trans. Image. Process., vol. 19,no. II,pp. 2889-2900,Nov. 2010.
[10] D. Martin, C. Fowlkes, D. Tal, and J. Malik, "A dataset of human
segmentation natural images and its application to evaluating segmen
tation algorithms and measuring ecological statistics," in Proc. Int. Con!
Computer Vision, pp. 416-423,2001.
[11] Q. Shan, Z. Li, J. Jia, and C-K. Tang, "Fast image/video upsampling,"
ACM Transactions on Graphics, vol. 27,no. 5,pp. 153-1-153-8,Dec.
2008.
[12] J. Sun, J. Sun, Z. Xu, and H-Y. Shum, "Gradient profile prior and its
applications in image super-resolution and enhancement," IEEE Trans.
Image. Process., vol. 20,no. 6,Jun. 2011.
[13] R. Timofte, V. D. Smet, and L. V. Gool, "Anchored neighborhood
regression for fast example-based super-resolution," in IEEE Int. Con!
Computer Vision, 2013.
[14] Q. Wang, X. Tang, and H. Shum, "Patch based blind image super
resolution," in Proc. IEEE Int. COllf Computer Vision, pp. 709-716,2005.
[15] J. Yang, J. Wright, T. S. Huang, and Y. Ma, "Image super-resolution via
sparse representation;' IEEE Trans. Image. Process., vol. 19,no. 11,pp.
2861-2873,Nov. 2010.
[16] J. Yang, Z. Wang, Z. Lin, S. Cohen, and T. S. Huang, "Coupled dictio
nary training for image super-resolution," IEEE Trans. Image. Process.,
vol. 21,no. 8,pp. 3467-3478,Aug. 2012.
[17] J. Yang, Z. Lin, and S. Cohen, "Fast image super-resolution based on
in-place example regression," in Proc. IEEE Con! Computer Vision and
Pattern Recognition, pp. 1059-1066,2013.
[18] C-Y. Yang, J-B. Huang, and M-H. Yang, "Exploiting self-similarities for
single frame super-resolution," in Proc. Asian Con! Computer Vision, pp.
497-510,2010.