Single Image Super-Resolution from
Transformed Self-Exemplars
Jia-Bin Huang Narendra AhujaAbhishek Singh
Single Image Super-Resolution
• Recovering high-resolution image from low-resolution one
Spatial frequency
Amplitude
Super-Resolution
Sharpening
Multi-image vs. Single-image
Multi-image
Source: [Park et al. SPM 2003]
Single-image
Source: [Freeman et al. CG&A 2002]
External Example-based Super-Resolution
Learning to map from low-res to high-res patches
• Nearest neighbor [Freeman et al. CG&A 02]
• Neighborhood embedding [Chang et a. CVPR 04]
• Sparse representation [Yang et al. TIP 10]
• Kernel ridge regression [Kim and Kwon PAMI 10]
• Locally-linear regression [Yang and Yang ICCV 13] [Timofte et al. ACCV 14]
• Convolutional neural network [Dong et al. ECCV 14]
• Random forest [Schulter et al. CVPR 15]
External dictionary
Internal Example-based Super-Resolution
Low-res and high-res example pairs from patch
recurrence across scale
• Non-local means with self-examples [Ebrahimi and Vrscay ICIRA 2007]
• Unified classical and example SR [Glasner et al. ICCV 2009]
• Local self-similarity [Freedman and Fattal TOG 2011]
• In-place regression [Yang et al. ICCV 2013]
• Nonparametric blind SR [Michaeli and Irani ICCV 2013]
• SR for noisy images [Singh et al. CVPR 2014]
• Sub-band self-similarity [Singh et al. ACCV 2014]
Internal dictionary
Motivation
• Internal dictionary
• More “relevant” patches
• Limited number of examples
• High-res patches are often available in the transformed domain
Symmetry Surface orientation Perspective distortion
Super-Resolution from Transformed Self-Exemplars
LR input image Matching error LR patch HR patch
Translation
Perspective
Ground truth
LR/HR patch
Translation
Ground truth
LR/HR patch
Affine
transform
LR input image Matching error LR patch HR patch
Input low-res image
All-frequency band low-frequency band
Super-Resolution Scheme
Multi-scale version of [Freedman and Fattal TOG 2011]
Input low-res image
LR/HR example pairs
Super-Resolution Scheme
Multi-scale version of [Freedman and Fattal TOG 2011]
low-frequency bandAll-frequency band
Input low-res image
low-frequency bandAll-frequency band
Input low-res image
low-frequency bandAll-frequency band
Super-Resolution as Nearest Neighbor Field Estimation
Appearance cost Plane compatibility Scale cost
[Huang et al. SIGGRAPH 2014] Scale
Search Patch Transformation
• Generalized PatchMatch [Barnes et al. ECCV 2010]
• Randomization
• Spatial propagation
• Backward compatible when planar structures were not detected
Perspective Similarity Affine
[Huang et al. SIGGRAPH 2014]
Results
Datasets – BSD 100 and Urban 100
Berkeley segmentation dataset (100 test images) Urban image dataset from Flickr (100 test images)
Dataset – Set5, Set14, and Sun-Hays 80
Set5
Set 14 Sun-Hays 80 [Sun and Hays ICCP 12]
Ground-truth HR
SRCNN [Dong et al. ECCV 14] Glasner [Glasner et al. ICCV 2009]
Our result
SR Factor 4x
Bicubic
A+ [Timofte et al. ACCV 14]
SR Factor 4x
Ground-truth HR
SRCNN [Dong et al. ECCV 14] Glasner [Glasner et al. ICCV 2009]
Our result
Bicubic
A+ [Timofte et al. ACCV 14]
SR Factor 4x
Ground-truth HR
SRCNN [Dong et al. ECCV 14] Glasner [Glasner et al. ICCV 2009]
Our result
Bicubic
A+ [Timofte et al. ACCV 14]
Bicubic
SRCNN [Dong et al. ECCV 14] A+ [Timofte et al. ACCV 14]
Our result
Ground-truth HR
Sub-band [Singh et al. ACCV 2014]
Ground-truth
SRCNN [Dong et al. ECCV 14]
Glasner [Glasner et al. ICCV 2009]
Our result
Ground-truth HR
SRCNN [Dong et al. ECCV 14]
Glasner [Glasner et al. ICCV 2009]
Our result
Bicubic
SRCNN [Dong et al. ECCV 14] A+ [Timofte et al. ACCV 14]
Our result
Ground-truth HR
Sub-band [Singh et al. ACCV 2014]
Bicubic
SRCNN [Dong et al. ECCV 14] A+ [Timofte et al. ACCV 14]
Our result
Ground-truth HR
Our resultSub-band [Singh et al. ACCV 2014]
Bicubic
SRCNN [Dong et al. ECCV 14] A+ [Timofte et al. ACCV 14]
Our result
Ground-truth HR
Glasner [Glasner et al. ICCV 2009]
Bicubic
SRCNN [Dong et al. ECCV 14] A+ [Timofte et al. ACCV 14]
Our result
Ground-truth HR
ScSR [Yang et al. TIP 10]
Bicubic
SRCNN [Dong et al. ECCV 14] A+ [Timofte et al. ACCV 14]
Our result
Ground-truth HR
ScSR [Yang et al. TIP 10]
Bicubic
SRCNN [Dong et al. ECCV 14] A+ [Timofte et al. ACCV 14]
Our result
Ground-truth HR
ScSR [Yang et al. TIP 10]
BSD 100 Dataset – SR factor 4x
Quantitative Results – Urban 100 dataset
Scale Bicubic ScSR Kim and Kwon Sub-band Glasner SRCNN A+ Ours
2x - PSNR 26.66 28.26 28.74 28.34 27.85 28.65 28.87 29.38
4x - PSNR 23.14 24.02 24.20 24.19 23.58 24.14 24.34 24.82
2x - SSIM 0.8408 0.8828 0.8940 0.8820 0.8709 0.8909 0.8957 0.9032
4x - SSIM 0.6573 0.7024 0.7104 0.7115 0.6736 0.7047 0.7195 0.7386
~ 0.5 dB averaged PSNR improvement over the state-of-the-art method
Quantitative Results – BSD 100 dataset
On par of the state-of-the-art method
Scale Bicubic ScSR Kim Sub-band Glasner SRCNN A+ Ours
2x - PSNR 29.55 30.77 31.11 30.73 30.28 31.11 31.22 31.18
3x - PSNR 27.20 27.72 28.17 27.88 27.06 28.20 28.30 28.30
4x - PSNR 25.96 26.61 26.71 26.60 26.17 26.70 26.82 26.85
2x - SSIM 0.8425 0.8744 0.8840 0.8774 0.8621 0.8835 0.8862 0.8855
3x - SSIM 0.7382 0.7647 0.7788 0.7714 0.7368 0.7794 0.7836 0.7843
4x - SSIM 0.6672 0.6983 0.7027 0.7021 0.6747 0.7018 0.7089 0.7108
Ground truth HR image
Input LR image
128 x 96
Bicubic SR Factor 8x
Internet-scale scene matching [Sun and Hays ICCP 12] SR Factor 8x
#Training images
6.3 millions
SRCNN [Dong et al. ECCV 14] SR Factor 8x
#Training images
395,909
from ImageNet
Our result SR Factor 8x
#Training image
1 LR input
Our result: coarse-to-fine super-resolution
Ground truth HR image
Input LR image
128 x 96
Bicubic SR Factor 8x
Sparse coding [Yang et al. TIP 10] SR Factor 8x
SRCNN [Dong et al. ECCV 14] SR Factor 8x
Our result SR Factor 8x
Our result: coarse-to-fine super-resolution
Ground truth HR image
Input LR image
128 x 96
Bicubic SR Factor 8x
SR Factor 8xInternet-scale scene matching [Sun and Hays ICCP 12]
SR Factor 8xSRCNN [Dong et al. ECCV 14]
Our result SR Factor 8x
Our result: coarse-to-fine super-resolution
Bicubic SR Factor 8x
SRCNN [Dong ECCV 2014] SR Factor 8x
Ours SR Factor 8x
Bicubic SR Factor 8x
SRCNN [Dong ECCV 2014] SR Factor 8x
Ours SR Factor 8x
Low-Res
TI-DTV
[Fernandez-Granda
and Candes ICCV 2013]
Ours
SR Factor 4x
Low-Res
TI-DTV
[Fernandez-Granda
and Candes ICCV 2013]
Ours
SR Factor 4x
Limitations – Blur Kernel Model
• Suffer from blur kernel mismatch
• Blind SR to estimate kernel
[Michaeli and Irani ICCV 2013]
[Efrat et al. ICCV 2013]
• With ground truth kernel, we can
get significantly improvement
• External example-based method
would need to retrain the model
Limitations
• Slow computation time
• On average, 40 seconds for super-resolving 2x on an image in BSD 100 dataset
on a 2.8Ghz PC, 12G RAM PC
SRF 4x
Ground truth HR Our result
A+ [Timofte et al. ACCV 14]SRCNN [Dong et al. ECCV 14]
Conclusions
• Super-resolution based on transformed self-exemplars
• No training data, no feature extraction, no complicated learning algorithms
• Works particularly well on urban scenes
• On par with state-of-the-art on natural scenes
Code and data available: http://bit.ly/selfexemplarsr
See us on poster #82
Single Image Super-Resolution
from Transformed Self-Exemplars
http://bit.ly/selfexemplarsr

Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)

Editor's Notes

  • #2 Thanks for the introduction.
  • #3 Image super-resolution is a longstanding problem in computer vision which aims at recovering missing high-frequency components in images. Take this image as an example, sharpening can boost the available spatial frequency to make the image appears more clear. In contrast, the goal of image super-resolution is to recover the high-frequency contents that are NOT present in the original image.
  • #4 Super-resolution (SR) techniques can be broadly classified into two classes. First, the classical multi-image approach can super-resolve a scene by combining images with subpixel misalignment. Second, example-based approaches achieve super-resolution by learning the mapping from low to high resolution image patches.
  • #5 A way to learning such a mapping is to build an external database of low-res/high-res pairs of patches. Then, one can use a machine learning algorithm to learning the mapping.
  • #6 On the other hand, people have shown that patches in a natural image tend to recur within and across scales many times. This internal statistics provides powerful image prior for SR.
  • #7 While the internal dictionary contains more relevant patches, they have significantly few examples than that from external methods. In the work, we propose to address this problem using transformed self-exemplars. The motivation is that many high-res patches are often available ONLY in the geometrically transformed domain.
  • #8 Here we show the comparisons of external, internal, and our approach.
  • #9 Here is an image consists of repetitive patterns. The red patches is the target patch we want to super-resolve. Using internal example-based methods, we match the low-res patch against all translated patches in the downscaled image. Here is the matching error. By selecting the patch with the lowest cost, we can get an exemplar patch pair for predicting the missing high-frequency contents. As the texture is perspectively distorted, the prediction is not accurate. In contrast, matching in the transformed space gives accurate prediction.
  • #10 Similarly, in this case, we can see that matching in affine transformation achieves better prediction.
  • #11 Now we describe our super-resolution scheme. Given an input low-resolution image, we construct an image pyramid by successively downscaling the image. This set of images contain all-frequency bands at their spatial resolution. We then perform upsampling to obtain the low-frequency band version of the pyramid.
  • #12 This two image pyramids form LR/HR example pairs.
  • #13 To perform super-resolution, we first upsample the input image using bicubic interpolation. The task here is to predict the missing high-frequency contents.
  • #14 We cast SR as a patch-based optimization problem. For each overlapping patches in the low-frequency band, we search its nearest neighbor in the transformed space, the estimated nearest neighbor field can then be used to reconstruct the missing high-frequency contents. To achieve high SRF, we iteratively perform such operation in a coarse to fine fashion.
  • #15 Our objective function for estimating nearest neighbor field consists of three terms. First, appearance cost measures how similar the warpped source patch to the target patch. We use L2 norm of RGB patches with bias correction. Second, we use the plane compatibility cost proposed in our previous work in SIGGRAPH 2014. This term encourage the search to lie on the correct region. The third cost encourages the search process to find source patches in the deeper level of the pyramid in order to get better reconstruction at higher SRF.
  • #18 We test our algorithm on two main datasets. BSD contains more natural scenes. To complement the dataset, we introduce a new Urban dataset. We download 100 urban images with structured scenes from Flickr using search keyword, city, architecture, etc.
  • #64 Thanks for your attention!