SlideShare a Scribd company logo
1 of 21
Download to read offline
Pixel Recursive Super Resolution
Ryan Dahl ∗
Mohammad Norouzi Jonathon Shlens
Google Brain
{rld,mnorouzi,shlens}@google.com
Abstract
We present a pixel recursive super resolution model that
synthesizes realistic details into images while enhancing
their resolution. A low resolution image may correspond
to multiple plausible high resolution images, thus modeling
the super resolution process with a pixel independent con-
ditional model often results in averaging different details–
hence blurry edges. By contrast, our model is able to repre-
sent a multimodal conditional distribution by properly mod-
eling the statistical dependencies among the high resolution
image pixels, conditioned on a low resolution input. We
employ a PixelCNN architecture to define a strong prior
over natural images and jointly optimize this prior with a
deep conditioning convolutional network. Human evalua-
tions indicate that samples from our proposed model look
more photo realistic than a strong L2 regression baseline.
1. Introduction
The problem of super resolution entails artificially en-
larging a low resolution photograph to recover a plausi-
ble high resolution version of it. When the zoom factor
is large, the input image does not contain all of the infor-
mation necessary to accurately construct a high resolution
image. Thus, the problem is underspecified and many plau-
sible high resolution images exist that match the low resolu-
tion input image. This problem is significant for improving
the state-of-the-art in super resolution, and more generally
for building better conditional generative models of images.
A super resolution model must account for the complex
variations of objects, viewpoints, illumination, and occlu-
sions, especially as the zoom factor increases. When some
details do not exist in the source image, the challenge lies
not only in ‘deblurring’ an image, but also in generating
∗Work done as a member of the Google Brain Residency program
(g.co/brainresidency).
8×8 input 32×32 samples ground truth
Figure 1: Illustration of our probabilistic pixel recursive
super resolution model trained end-to-end on a dataset of
celebrity faces. The left column shows 8×8 low resolution
inputs from the test set. The middle and last columns show
32×32 images as predicted by our model vs. the ground
truth. Our model incorporates strong face priors to synthe-
size realistic hair and skin details.
new image details that appear plausible to a human ob-
server. Generating realistic high resolution images is not
possible unless the model draws sharp edges and makes
hard decisions about the type of textures, shapes, and pat-
terns present at different parts of an image.
Imagine a low resolution image of a face, e.g., the 8×8
images depicted in the left column of Figure 1–the details
1
arXiv:1702.00783v1[cs.CV]2Feb2017
of the hair and the skin are missing. Such details cannot be
faithfully recovered using simple interpolation techniques
such as linear or bicubic. However, by incorporating the
prior knowledge of the faces and their typical variations, an
artist is able to paint believable details. In this paper, we
show how a fully probabilistic model that is trained end-to-
end can play the role of such an artist by synthesizing 32×32
face images depicted in the middle column of Figure 1. Our
super resolution model comprises two components that are
trained jointly: a conditioning network, and a prior net-
work. The conditioning network effectively maps a low
resolution image to a distribution over corresponding high
resolution images, while the prior models high resolution
details to make the outputs look more realistic. Our con-
ditioning network consists of a deep stack of ResNet [10]
blocks, while our prior network comprises a PixelCNN [28]
architecture.
We find that standard super resolution metrics such as
peak signal-to-noise ratio (pSNR) and structural similar-
ity (SSIM) fail to properly measure the quality of predic-
tions for an underspecified super resolution task. These
metrics prefer conservative blurry averages over more plau-
sible photo realistic details, as new fine details often do
not align exactly with the original details. Our evalua-
tion studies demonstrate that humans easily distinguish real
images from super resolution predictions when regression
techniques are used, but they have a harder time telling our
samples apart from real images.
2. Related work
Super resolution has a long history in computer vi-
sion [22]. Methods relying on interpolation [11] are easy
to implement and widely used, however these methods suf-
fer from a lack of expressivity since linear models cannot
express complex dependencies between the inputs and out-
puts. In practice, such methods often fail to adequately pre-
dict high frequency details leading to blurry high resolution
outputs.
Enhancing linear methods with rich image priors such
as sparsity [2] or Gaussian mixtures [35] have substantially
improved the quality of the methods; likewise, leveraging
low-level image statistics such as edge gradients improves
predictions [31, 26, 6, 12, 25, 17]. Much work has been
done on algorithms that search a database of patches and
combine them to create plausible high frequency details in
zoomed images [7, 13]. Recent patch-based work has fo-
cused on improving basic interpolation methods by building
a dictionary of pre-learned filters on images and selecting
the appropriate patches by an efficient hashing mechanism
[23]. Such dictionary methods have improved the inference
speed while being comparable to state-of-the-art.
Another approach for super resolution is to abandon in-
ference speed requirements and focus on constructing the
high resolution images at increasingly higher magnification
factors. Convolutional neural networks (CNNs) represent
an approach to the problem that avoids explicit dictionary
construction, but rather implicitly extracts multiple layers
of abstractions by learning layers of filter kernels. Dong et
al. [5] employed a three layer CNN with MSE loss. Kim et
al. [16] improved accuracy by increasing the depth to 20
layers and learning only the residuals between the high res-
olution image and an interpolated low resolution image.
Most recently, SRResNet [18] uses many ResNet blocks to
achieve state of the art pSNR and SSIM on standard super
resolution benchmarks–we employ a similar design for our
conditional network and catchall regression baseline.
Instead of using a per-pixel loss, Johnson et al.[14]
use Euclidean distance between activations of a pre-trained
CNN for model’s predictions vs. ground truth images. Us-
ing this so-called preceptual loss, they train feed-forward
networks for super resolution and style transfer. Bruna et
al. [3] also use perceptual loss to train a super resolution
network, but inference is done via gradient propagation to
the low-res input (e.g., [9]).
Ledig et al. [18] and Yu et al. [33] use GANs to cre-
ate compelling super resolution results showing the ability
of the model to predict plausible high frequency details.
Sønderby et al. [15] also investigate GANs for super res-
olution using a learned affine transformation that ensures
the models only generate images that downscale back to the
low resolution inputs. Sønderby et al. [15] also explore a
masked autoregressive model like PixelCNN [27] but with-
out the gated layers and using a mixture of gaussians in-
stead of a multinomial distribution. Denton et al. [4] use a
multi-scale adversarial network for image synthesis, but the
architecture also seems beneficial for super resolution.
PixelRNN and PixelCNN by Oord et al. [27, 28] are
probabilistic generative models that impose an order on im-
age pixels representing them as a long sequence. The proba-
bility of each pixel is then conditioned on the previous ones.
The gated PixelCNN obtained state of the art log-likelihood
scores on CIFAR-10 and MNIST, making it one of the most
competetive probabilistic generative models.
Since PixelCNN uses log-likelihood for training, the
model is highly penelized if negligible probability is as-
signed to any of the training examples. By contrast, GANs
only learn enough to fool a non-stationary discriminator.
One of their common failure cases is mode collapsing were
samples are not diverse enough [21]. Furthermore, GANs
require careful tuning of hyperparameters to ensure the dis-
criminator and generator are equally powerful and learn at
equal rates. PixelCNNs are more robust to hyperparame-
ter changes and usually have a nicely decaying loss curve.
Thus, we adopt PixelCNN for super resolution applications.
3. Probabilistic super resolution
We aim to learn a probabilistic super resolution model
that discerns the statistical dependencies between a high
resolution image and a corresponding low resolution im-
age. Let x and y denote a low resolution and a high resolu-
tion image, where y∗
represents a ground-truth high res-
olution image. In order to learn a parametric model of
pθ(y | x), we exploit a large dataset of pairs of low res-
olution inputs and ground-truth high resolution outputs, de-
noted D ≡ {(x(i)
, y∗(i)
)}N
i=1. One can easily collect such
a large dataset by starting from a set of high resolution im-
ages and lowering their resolution as much as needed. To
optimize the parameters θ of the conditional distribution p,
we maximize a conditional log-likelihood objective defined
as,
O(θ | D) =
(x,y∗)∈D
log p(y∗
| x) . (1)
The key problem discussed in this paper is the exact form
of p(y | x) that enables efficient learning and inference,
while generating realistic non-blurry outputs. We first dis-
cuss pixel-independent models that assume that each out-
put pixel is generated with an independent stochastic pro-
cess given the input. We elaborate why these techniques
result in sub-optimal blurry super resolution results. Finally
we describe our pixel recursive super resolution model that
generates output pixels one at a time to enable modeling
the statistical dependencies between the output pixels using
PixelCNN [27, 28], and synthesizes sharp images from very
blurry input.
3.1. Pixel independent super resolution
The simplest form of a probabilistic super resolution
model assumes that the output pixels are conditionally inde-
pendence given the inputs. As such, the conditional distri-
bution of p(y | x) factorizes into a product of independent
pixel predictions. Suppose an RGB output y has M pixels
each with three color channels, i.e., y ∈ R3M
. Then,
log p(y | x) =
3M
i=1
log p(yi | x) . (2)
Two general forms of pixel prediction models have been ex-
plored in the literature: Gaussian and multinomial distribu-
tions to model continuous and discrete pixel values respec-
tively. In the Gaussian case,
log p(yi | x) = −
1
2δ2
yi − Ci(x) 2
2 − log
√
2δ2π , (3)
where Ci(x) denotes the ith
element of a non-linear trans-
formation of x via a convolutional neural network. Ci(x)
is the estimated mean for the ith
output pixel yi, and σ2
denotes the variance. Often the variance is not learned, in
How the dataset was created
50%/50%
Model output
Cross-entropy
L2 Regression
PixelCNN
Figure 2: Top: A cartoon of how the input and output pairs
for the toy dataset were created. Bottom: Example pre-
dictions for various algorithms trained on this dataset. The
pixel independent L2 regression and cross-entropy models
do not exhibit multimodal predictions. The PixelCNN out-
put is stochastic and multiple samples will place a digit in
either corner 50% of the time.
which case maximizing the conditional log-likelihood of (1)
reduces to minimizing the mean squared error (MSE) be-
tween yi and Ci(x) across the pixels and channels through-
out the dataset. Super resolution models based on MSE re-
gression (e.g., [5, 16, 18]) fall within this family of pixel
independent models, where the outputs of a neural network
parameterize a set of Gaussians with fixed bandwidth.
Alternatively, one could use a flexible multinomial dis-
tribution as the pixel prediction model, in which case the
ouput dimensions are discretized into K possible values
(e.g., K = 256) where yi ∈ {1, . . . , K}. The pixel pre-
diction model based on a multinomial softmax operator is
represented as,
log p(yi = k | x) = wjk
T
Ci(x)−log
K
v=1
exp{wjv
T
Ci(x)} ,
(4)
where {wjk}3,K
j=1,k=1 denote the softmax weights for differ-
ent color channels and different discrete values.
3.2. Synthetic multimodal task
To demonstrate how the above pixel independent models
can fail at conditional image modeling, we created a syn-
thetic dataset that is explicitly multimodal. For many gen-
erative tasks like super resolution, colorization, and depth
estimation, models that are able to predict a mode without
averaging effects are desirable. For example, in coloriza-
tion, selecting a strong red or blue for a car is better than
selecting a sepia toned average of all of the colors of cars
that the model has been exposed to. In this synthetic task,
the input is an MNIST digit (1st
row of Figure 2), and the
output is the same input digit but scaled and translated ei-
ther into the upper left corner or upper right corner (2nd
and
3rd
rows of Figure 2). The dataset has an equal ratio of up-
per left and upper right outputs, which we call the MNIST
corners dataset.
A convolutional network using per pixel squared error
loss (Figure 2, L2 Regression) produces two blurry fig-
ures. Replacing the continuous loss with a per-pixel cross-
entropy produces crisper images but also fails to capture the
stochastic bimodality because both digits are shown in both
corners (Figure 2, cross-entropy). In contrast, a model that
explicitly deals with multi-modality, PixelCNN stochasti-
cally predicts a digit in the upper-left or bottom-right cor-
ners but never predicts both digits simultaneously (Figure 2,
PixelCNN).
See Figure 5 for examples of our super resolution model
predicting different modes on a realistic dataset.
Any good generative model should be able to make sharp
single mode predictions and a dataset like this would be a
good starting point for any new models.
4. Pixel recursive super resolution
The main issue with the previous probabilistic models
(Equations (3) and (4)) for super resolution is the lack
of conditional dependency between super resolution pixels.
There are two general methods to model statistical correla-
tions between output pixels. One approach is to define the
conditional distribution of the output pixels jointly by ei-
ther a multivariate Gaussian mixture [36] or an undirected
graphical model such as conditional random fields [8]. With
these approaches one has to commit to a particular form of
statistical dependency between the output pixels, for which
inference can be computationally expensive. The second
approach that we follow in this work, is to factorize the con-
ditional distribution using chain rule as,
log p(y | x) =
M
i=1
log p(yi | x, y<i) , (5)
where the generation of each output dimension is condi-
tioned on the input, previous output pixels, and the previous
channels of the same output pixel. For simplicity of exposi-
tion, we ignore different output channels in our derivations,
and use y<i to represent {y1, . . . , yi−1}. The benefits of
this approach is that the exact form of the conditional de-
pendencies is flexible and the inference is straightforward.
Inspired by the PixelCNN model, we use a multinomial dis-
tribution to model discrete pixel values in Eq. (5). Alter-
natively, one could use an autoregressive prediction model
conditioning!
network (CNN)!
prior network!
(PixelCNN)
+
logits
HR!
image
HR!
image
Figure 3: The proposed super resolution network com-
prises a conditioning network and a prior network. The
conditioning network is a CNN that receives a low reso-
lution image as input and outputs logits predicting the con-
ditional log-probability of each high resolution (HR) image
pixel. The prior network, a PixelCNN [28], makes predic-
tions based on previous stochastic predictions (indicated by
dashed line). The model’s probability distribution is com-
puted as a softmax operator on top of the sum of the two
sets of logits from the prior and conditioning networks.
with Gaussian or Logistic (mixture) conditionals as pro-
posed in [24].
Our model, outlined in Figure 3, comprises two ma-
jor components that are fused together at a late stage and
trained jointly: (1) a conditioning network (2) a prior net-
work. The conditioning network is a pixel independent pre-
diction model that maps a low resolution image to a proba-
bilistic skeleton of a high resolution image, while the prior
network is supposed to add natural high resolution details
to make the outputs look more realistic.
Given an input x ∈ RL
, let Ai(x) : RL
→ RK
denote
a conditioning network predicting a vector of logit values
corresponding to the K possible values that the ith
output
pixel can take. Similarly, let Bi(y<i) : Ri−1
→ RK
denote
a prior network predicting a vector of logit values for the ith
output pixel. Our probabilistic model predicts a distribution
over the ith
output pixel by simply adding the two sets of
logits and applying a softmax operator on them,
p(yi | x, y<i) = softmax(Ai(x) + Bi(y<i)) . (6)
To optimize the parameters of A and B jointly, we per-
form stochastic gradient ascent to maximize the conditional
log likelihood in (1). That is, we optimize a cross-entropy
loss between the model’s predictions in (6) and discrete
ground truth labels y∗
i ∈ {1, . . . , K},
O1 =
(x,y∗)∈D
M
i=1
1[y∗
i ]
T
(Ai(x) + Bi(y∗
<i))
− lse(Ai(x) + Bi(y∗
<i)) ,
(7)
where lse(·) is the log-sum-exp operator corresponding to
the log of the denominator of a softmax, and 1[k] denotes a
K-dimensional one-hot indicator vector with its kth
dimen-
sion set to 1.
Our preliminery experiments indicate that models
trained with (7) tend to ignore the conditioning network
as the statistical correlation between a pixel and previous
high resolution pixels is stronger than its correlation with
low resolution inputs. To mitigate this issue, we include
an additional loss in our objective to enforce the condition-
ing network to be optimized. This additional loss measures
the cross-entropy between the conditioning network’s pre-
dictions via softmax(Ai(x)) and ground truth labels. The
total loss that is optimized in our experiments is a sum of
two cross-entropy losses formulated as,
O2 =
(x,y∗)∈D
M
i=1
1[y∗
i ]
T
(2 Ai(x) + Bi(y∗
<i))
− lse(Ai(x) + Bi(y∗
<i)) − lse(Ai(x)) .
(8)
Once the network is trained, sampling from the model
is straightforward. Using (6), starting at i = 1, first we
sample a high resolution pixel. Then, we proceed pixel by
pixel, feeding in the previously sampled pixel values back
into the network, and draw new high resolution pixels. The
three channels of each pixel are generated sequentially in
turn.
We additionally consider greedy decoding, where one al-
ways selects the pixel value with the largest probability and
sampling from a tempered softmax, where the concentra-
tion of a distribution p is adjusted by using a temperature
parameter τ > 0,
pτ =
pτ
pτ
1
.
To control the concentration of our sampling distribution
p(yi | x, y<i), it suffices to multiply the logits from A and
B by a parameter τ. Note that as τ goes towards ∞, the dis-
tribution converges to the mode1
, and sampling converges to
greedy decoding.
4.1. Implementation details
The conditioning network is a feed-forward convolu-
tional neural network that takes an 8×8 RGB image through
1We use a non-standard notion of temperature that represents 1
τ
in the
standard notation.
a series of ResNet [10] blocks and transpose convolution
layers while maintaining 32 channels throughout. The last
layer uses a 1×1 convolution to increase the channels to
256×3 and uses the resulting activations to predict a multi-
nomial distribution over 256 possible sub-pixel values via a
softmax operator.
This network provides the ability to absorb the global
structure of the image in the marginal probability distribu-
tion of the pixels. Due to the softmax layer it can capture
the rich intricacies of the high resolution distribution, but
we have no way to coherently sample from it. Sampling
sub-pixels independently will mix the assortment of distri-
butions.
The prior network provides a way to tie together the sub-
pixel distributions and allow us to take samples dependent
on each other. We use 20 gated PixelCNN layers with 32
channels at each layer. We leave conditioning until the late
stages of the network, where we add the pre-softmax ac-
tivations from the conditioning network and prior network
before computing the final joint softmax distribution.
Our model is built by using TensorFlow [1] and trained
across 8 GPUs with synchronous SGD updates. See ap-
pendix A for further details.
5. Experiments
We assess the effectiveness of the proposed pixel re-
cursive super resolution method on two datasets contain-
ing small faces and bedroom images. The first dataset is
a version of the CelebA dataset [19] composed of a set
of celebrity faces, which are cropped around the face. In
the second dataset LSUN Bedrooms [32], images are center
cropped. In both datasets we resize the images to 32×32
with bicubic interpolation and again to 8×8, constituting
the output and input pairs for training and evaluation. We
present representative super resolution examples on held
out test sets and report human evaluations of our predictions
in Table 1.
We compare our results with two baselines: a pixel inde-
pendent L2 regression (“Regression”) and a nearest neigh-
bors search (“NN“). The architecture used for the regres-
sion baseline is identical to the conditioning network used in
our model, consisting of several ResNet blocks and upsam-
pling convolutional layers, except that the baseline regres-
sion model outputs three channels and has a final tanh(·)
non-linearity instead of ReLU. The regression architecture
is similar in design to to SRResNet [18], which reports state
of the art scores in image similarity metrics. Furthermore,
we train the regression network to predict super resolution
residuals instead of the actual pixel values. The residu-
als are computed based on bicubic interpolation of the in-
put, and are known to work better to provide superior pre-
dictions [16]. The nearest neighbors baseline searches the
downsampled training set for the nearest example (using eu-
Input Regression Ours G. Truth NN
Figure 4: Samples from the model trained on LSUN Bed-
rooms at 32 × 32.
clidean distance) and returns its high resolution counterpart.
5.1. Sampling
Sampling from the model multiple times results in dif-
ferent high resolution images for a given low resolution im-
age (Figure 5). A given model will identify many plausi-
ble high resolution images that correspond to a given lower
resolution image. Each one of these samples may contain
distinct qualitative features and each of these modes is con-
tained within the PixelCNN. Note that the differences be-
tween samples for the faces dataset are far less drastic than
seen in our synthetic dataset, where failure to cleanly pre-
dict modes meant complete failure.
The quality of samples is sensitive to the softmax tem-
perature. When the mode is sampled (τ = ∞) at each sub-
pixel, the samples are of poor quality, they look smooth with
horizontal and vertical line artifacts. Sampling at τ = 1.0,
the exact probability given by the model, tend to be more
jittery with high frequency content. It seems in this case
there are multiple less certain trajectories and the samples
Figure 5: Left: low-res input. Right: Diversity of super
resolution samples at τ = 1.2.
jump back and forth between them–perhaps this is allevi-
ated with more capacity and training time. Manually tuning
the softmax temperature was necessary to find good looking
samples–usually a value between 1.1 and 1.3 worked.
In Figure 6 are various test predictions with their nega-
tive log probability scores listed below each image. Smaller
scores means the model has assigned that image a larger
probability mass. The greedy, bicubic, and regression faces
are preferred by the model despite being worse quality.
This is probably because their smooth face-like structure
doesn’t contradict the learned distributions. Yet sampling
with the proper softmax temperature nevertheless finds re-
alistic looking images.
Ground Truth NN Bicubic Regression Greedy τ = 1.0 τ = 1.1 τ = 1.2
2.85 2.74 1.76 2.34 1.82 2.94 2.79 2.69
2.96 2.71 1.82 2.17 1.77 3.18 3.09 2.95
2.76 2.63 1.80 2.35 1.64 2.99 2.90 2.64
Figure 6: Our model does not produce calibrated log-probabilities for the samples. Negative log-probabilities are reported
below each image. Note that the best log-probabilities arise from bicubic interpolation and greedy sampling even though the
images are poor quality.
5.2. Image similarity
Many methods exist for quantifying image similarity that
attempt to measure human perception judgements of simi-
larity [29, 30, 20]. We quantified the prediction accuracy
of our model compared to ground truth using pSNR and
MS-SSIM (Table 1). We found that our own visual assess-
ment of the predicted image quality did not correspond to
these image similarities metrics. For instance, bicubic in-
terpolation achieved relatively high metrics even though the
samples appeared quite poor. This result matches recent ob-
servations that suggest that pSNR and SSIM provide poor
judgements of super resolution quality [18, 14] when new
details are synthesized.
To ensure that samples do indeed correspond to the low-
resolution input, we measured how consistent the high res-
olution output image is with the low resolution input image
(Table 1, ’consistency’). Specifically, we measured the L2
distance between the low-resolution input image and a bicu-
bic downsampled version of the high resolution estimate.
Lower L2 distances correspond to high resolutions that are
more similar to the original low resolution image. Note that
the nearest neighbor high resolution images are less consis-
tent even though we used a database of 3 million training
images to search for neighbors in the case of LSUN bed-
rooms. In contrast, the bicubic resampling and the Pixel-
CNN upsampling methods showed consistently better con-
sistency with the low resolution image. This indicates that
our samples do indeed correspond to the low-resolution in-
put.
5.3. Human study
We presented crowd sourced workers with two images: a
true image and the corresponding prediction from our vari-
ous models. Workers were asked “Which image, would you
guess, is from a camera?” Following the setup in Zhang et
al [34], we present each image for one second at a time
before allowing them to answer. Workers are started with
10 practice pairs during which they get feedback if they
choose correctly or not. The practice pairs not counted in
the results. After the practice pairs, they are shown 45 ad-
ditional pairs, 5 of which are golden questions designed to
test if the person is paying attention. The golden question
pits a bicubicly upsampled image (very blurry) against the
ground truth. Excluding the golden and practice questions,
we count fourty answers per session. Sessions in which they
missed any golden questions are thrown out. Workers were
only allowed to participate in any of our studies once. We
continued running sessions until fourty different different
workers were tested on each of the four algorithms.
We report in Table 1 the percent of the time users choose
an algorithm’s output over the ground truth counterpart.
Note that 50% would say that an algorithm perfectly con-
fused the subjects.
Algorithm pSNR SSIM MS-SSIM Consistency % Fooled
Bicubic 28.92 0.84 0.76 0.006 –
NN 28.18 0.73 0.66 0.024 –
Regression 29.16 0.90 0.90 0.004 4.0 ± 0.2
τ = 1.0 29.09 0.84 0.86 0.008 11.0 ± 0.1
τ = 1.1 29.08 0.84 0.85 0.008 10.4 ± 0.2
τ = 1.2 29.08 0.84 0.86 0.008 10.2 ± 0.1
Bicubic 28.94 0.70 0.70 0.002 –
NN 28.15 0.49 0.45 0.040 –
Regression 28.87 0.74 0.75 0.003 2.1 ± 0.1
τ = 1.0 28.92 0.58 0.60 0.016 17.7 ± 0.4
τ = 1.1 28.92 0.59 0.59 0.017 22.4 ± 0.3
τ = 1.2 28.93 0.59 0.58 0.018 27.9 ± 0.3
Table 1: Top: Results on the cropped CelebA test dataset at 32×32 magnified from 8×8. Bottom: LSUN bedrooms. pSNR,
SSIM, and MS-SSIM measure image similarity between samples and the ground truth. Consistency lists the MSE between
the input low-res image and downsampled samples on a [0, 1] scale. % Fooled reports how often the algorithms samples
fooled a human in a crowd sourced study; 50% would be perfectly confused.
6. Conclusion
As in many image transformation tasks, the central prob-
lem of super resolution is in hallucinating sharp details by
choosing a mode of the output distribution. We explored
this underspecified problem using small images, demon-
strating that even the smallest 8×8 images can be enlarged
to sharp 32×32 images. We introduced a toy dataset with
a small number of explicit modes to demonstrate the failure
cases of two common pixel independent likelihood models.
In the presented model, the conditioning network gets us
most of the way towards predicting a high-resolution im-
age, but the outputs are blurry where the model is uncer-
tain. Combining the conditioning network with a PixelCNN
model provides a strong prior over the output pixels, allow-
ing the model to generate crisp predications. Our human
evaluations indicate that samples from our model on aver-
age look more photo realistic than a strong regression based
conditioning network alone.
References
[1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen,
C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghe-
mawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia,
R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Man´e,
R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster,
J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker,
V. Vanhoucke, V. Vasudevan, F. Vi´egas, O. Vinyals, P. War-
den, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. Tensor-
Flow: Large-scale machine learning on heterogeneous sys-
tems, 2015. Software available from tensorflow.org. 5
[2] M. Aharon, M. Elad, and A. Bruckstein. Svdd: An algorithm
for designing overcomplete dictionaries for sparse represen-
tation. Trans. Sig. Proc., 54(11):4311–4322, Nov. 2006. 2
[3] J. Bruna, P. Sprechmann, and Y. LeCun. Super-resolution
with deep convolutional sufficient statistics. CoRR,
abs/1511.05666, 2015. 2
[4] E. L. Denton, S. Chintala, A. Szlam, and R. Fergus. Deep
generative image models using a laplacian pyramid of adver-
sarial networks. NIPS, 2015. 2
[5] C. Dong, C. C. Loy, K. He, and X. Tang. Image super-
resolution using deep convolutional networks. CoRR,
abs/1501.00092, 2015. 2, 3
[6] R. Fattal. Image upsampling via imposed edge statistics.
ACM Trans. Graph., 26(3), July 2007. 2
[7] W. T. Freeman, T. R. Jones, and E. C. Pasztor. Example-
based super-resolution. IEEE Computer graphics and Appli-
cations, 2002. 2
[8] W. T. Freeman and E. C. Pasztor. Markov networks for super-
resolution. In CISS, 2000. 4
[9] L. A. Gatys, A. S. Ecker, and M. Bethge. A neural algorithm
of artistic style. CoRR, abs/1508.06576, 2015. 2
[10] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning
for image recognition. CVPR, 2015. 2, 5
[11] H. Hou and H. Andrews. Cubic splines for image interpola-
tion and digital filtering. Acoustics, Speech and Signal Pro-
cessing, IEEE Transactions on, 26(6):508–517, Jan. 2003.
2
[12] J. Huang and D. Mumford. Statistics of natural images and
models. In Computer Vision and Pattern Recognition, 1999.
IEEE Computer Society Conference on., volume 1. IEEE,
1999. 2
[13] J.-B. Huang, A. Singh, and N. Ahuja. Single image super-
resolution from transformed self-exemplars. In IEEE Con-
ference on Computer Vision and Pattern Recognition), 2015.
2
[14] J. Johnson, A. Alahi, and F. Li. Perceptual losses
for real-time style transfer and super-resolution. CoRR,
abs/1603.08155, 2016. 2, 7
Ours Ground Truth Ours Ground Truth
23/40 = 57% 34/40 = 85%
17/40 = 42% 30/40 = 75%
16/40 = 40% 26/40 = 65%
1/40 = 2% 3/40 = 7%
1/40 = 2% 3/40 = 7%
1/40 = 2% 4/40 = 1%
Figure 7: The best and worst rated images in the human
study. The fractions below the images denote how many
times a person choose that image over the ground truth.
See the supplementary material for more images used in the
study.
[15] C. Kaae Sønderby, J. Caballero, L. Theis, W. Shi, and
F. Husz´ar. Amortised MAP Inference for Image Super-
resolution. ArXiv e-prints, Oct. 2016. 2
[16] J. Kim, J. K. Lee, and K. M. Lee. Accurate image super-
resolution using very deep convolutional networks. CoRR,
abs/1511.04587, 2015. 2, 3, 5
[17] K. I. Kim and Y. Kwon. Single-image super-resolution using
sparse regression and natural image prior. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 32(6):1127–
1133, 2010. 2
[18] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Aitken, A. Te-
jani, J. Totz, Z. Wang, and W. Shi. Photo-realistic single im-
age super-resolution using a generative adversarial network.
arXiv:1609.04802, 2016. 2, 3, 5, 7
[19] Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face
attributes in the wild. In Proceedings of International Con-
ference on Computer Vision (ICCV), 2015. 5
[20] K. Ma, Q. Wu, Z. Wang, Z. Duanmu, H. Yong, H. Li, and
L. Zhang. Group mad competition - a new methodology
to compare objective image quality models. In The IEEE
Conference on Computer Vision and Pattern Recognition
(CVPR), June 2016. 6
[21] L. Metz, B. Poole, D. Pfau, and J. Sohl-Dickstein. Unrolled
generative adversarial networks. CoRR, abs/1611.02163,
2016. 2
[22] K. Nasrollahi and T. B. Moeslund. Super-resolution: A com-
prehensive survey. Mach. Vision Appl., 25(6):1423–1468,
Aug. 2014. 2
[23] Y. Romano, J. Isidoro, and P. Milanfar. RAISR: rapid and ac-
curate image super resolution. CoRR, abs/1606.01299, 2016.
2
[24] T. Salimans, A. Karpathy, X. Chen, D. P. Kingma, and Y. Bu-
latov. Pixelcnn++: A pixelcnn implementation with dis-
cretized logistic mixture likelihood and other modifications.
under review at ICLR 2017. 4
[25] Q. Shan, Z. Li, J. Jia, and C.-K. Tang. Fast image/video
upsampling. ACM Transactions on Graphics (TOG),
27(5):153, 2008. 2
[26] J. Sun, Z. Xu, and H.-Y. Shum. Image super-resolution us-
ing gradient profile prior. In Computer Vision and Pattern
Recognition, 2008. CVPR 2008. IEEE Conference on, pages
1–8. IEEE, 2008. 2
[27] A. van den Oord, N. Kalchbrenner, and K. Kavukcuoglu.
Pixel recurrent neural networks. ICML, 2016. 2, 3
[28] A. van den Oord, N. Kalchbrenner, O. Vinyals, L. Espeholt,
A. Graves, and K. Kavukcuoglu. Conditional image genera-
tion with pixelcnn decoders. NIPS, 2016. 2, 3, 4
[29] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simon-
celli. Image quality assessment: from error visibility to
structural similarity. IEEE transactions on image process-
ing, 13(4):600–612, 2004. 6
[30] Z. Wang, E. P. Simoncelli, and A. C. Bovik. Multiscale
structural similarity for image quality assessment. In Sig-
nals, Systems and Computers, 2004. Conference Record of
the Thirty-Seventh Asilomar Conference on, volume 2, pages
1398–1402. Ieee, 2004. 6
[31] C. Y. Yang, S. Liu, and M. H. Yang. Structured face hallu-
cination. In 2013 IEEE Conference on Computer Vision and
Pattern Recognition, pages 1099–1106, June 2013. 2
[32] F. Yu, Y. Zhang, S. Song, A. Seff, and J. Xiao. Lsun: Con-
struction of a large-scale image dataset using deep learning
with humans in the loop. arXiv preprint arXiv:1506.03365,
2015. 5
[33] X. Yu and F. Porikli. Ultra-Resolving Face Images by Dis-
criminative Generative Networks, pages 318–333. Springer
International Publishing, Cham, 2016. 2
[34] R. Zhang, P. Isola, and A. A. Efros. Colorful image coloriza-
tion. ECCV, 2016. 7
[35] D. Zoran and Y. Weiss. From learning models of natural
image patches to whole image restoration. In Proceedings
of the 2011 International Conference on Computer Vision,
ICCV ’11, pages 479–486, Washington, DC, USA, 2011.
IEEE Computer Society. 2
[36] D. Zoran and Y. Weiss. From learning models of natural
image patches to whole image restoration. In CVPR, 2011.
4
A.
Operation Kernel Strides Feature maps
Conditional network – 8 × 8 × 3 input
B × ResNet block 3 × 3 1 32
Transposed Convolution 3 × 3 2 32
B × ResNet block 3 × 3 1 32
Transposed Convolution 3 × 3 2 32
B × ResNet block 3 × 3 1 32
Convolution 1 × 1 1 3 ∗ 256
PixelCNN network – 32 × 32 × 3 input
Masked Convolution 7 × 7 1 64
20 × Gated Convolution Layers 5 × 5 1 64
Masked Convolution 1 × 1 1 1024
Masked Convolution 1 × 1 1 3 ∗ 256
Optimizer RMSProp (decay=0.95, momentum=0.9, epsilon=1e-8)
Batch size 32
Iterations 2,000,000 for Bedrooms, 200,000 for faces.
Learning Rate 0.0004 and divide by 2 every 500000 steps.
Weight, bias initialization truncated normal (stddev=0.1), Constant(0)
Table 2: Hyperparameters used for both datasets. For LSUN bedrooms B = 10 and for the cropped CelebA faces B = 6.
B. LSUN bedrooms samples
Input Bicubic Regression τ = 1.0 τ = 1.1 τ = 1.2 Truth NN
Input Bicubic Regression τ = 1.0 τ = 1.1 τ = 1.2 Truth NN
Input Bicubic Regression τ = 1.0 τ = 1.1 τ = 1.2 Truth NN
Input Bicubic Regression τ = 1.0 τ = 1.1 τ = 1.2 Truth NN
Input Bicubic Regression τ = 1.0 τ = 1.1 τ = 1.2 Truth NN
C. Cropped CelebA faces
Input Bicubic Regression τ = 1.0 τ = 1.1 τ = 1.2 Truth NN
Input Bicubic Regression τ = 1.0 τ = 1.1 τ = 1.2 Truth NN
Input Bicubic Regression τ = 1.0 τ = 1.1 τ = 1.2 Truth NN
Input Bicubic Regression τ = 1.0 τ = 1.1 τ = 1.2 Truth NN
Input Bicubic Regression τ = 1.0 τ = 1.1 τ = 1.2 Truth NN

More Related Content

What's hot

Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural NetworksYogendra Tamang
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetSungminYou
 
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...JaeJun Yoo
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)SungminYou
 
Content Based Image Retrieval Using 2-D Discrete Wavelet Transform
Content Based Image Retrieval Using 2-D Discrete Wavelet TransformContent Based Image Retrieval Using 2-D Discrete Wavelet Transform
Content Based Image Retrieval Using 2-D Discrete Wavelet TransformIOSR Journals
 
Survey on Haze Removal Techniques
Survey on Haze Removal TechniquesSurvey on Haze Removal Techniques
Survey on Haze Removal TechniquesEditor IJMTER
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Basit Rafiq
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Muhammad Haroon
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkMojammilHusain
 
A fast single image haze removal algorithm using color attenuation prior
A fast single image haze removal algorithm using color attenuation priorA fast single image haze removal algorithm using color attenuation prior
A fast single image haze removal algorithm using color attenuation priorLogicMindtech Nologies
 
CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketakiKetaki Patwari
 
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...IOSR Journals
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkRichard Kuo
 
Neural network based image compression with lifting scheme and rlc
Neural network based image compression with lifting scheme and rlcNeural network based image compression with lifting scheme and rlc
Neural network based image compression with lifting scheme and rlceSAT Publishing House
 
Image compression and reconstruction using a new approach by artificial neura...
Image compression and reconstruction using a new approach by artificial neura...Image compression and reconstruction using a new approach by artificial neura...
Image compression and reconstruction using a new approach by artificial neura...Hưng Đặng
 

What's hot (19)

Final Paper 2
Final Paper 2Final Paper 2
Final Paper 2
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
 
Content Based Image Retrieval Using 2-D Discrete Wavelet Transform
Content Based Image Retrieval Using 2-D Discrete Wavelet TransformContent Based Image Retrieval Using 2-D Discrete Wavelet Transform
Content Based Image Retrieval Using 2-D Discrete Wavelet Transform
 
Survey on Haze Removal Techniques
Survey on Haze Removal TechniquesSurvey on Haze Removal Techniques
Survey on Haze Removal Techniques
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
Mnist report
Mnist reportMnist report
Mnist report
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
A fast single image haze removal algorithm using color attenuation prior
A fast single image haze removal algorithm using color attenuation priorA fast single image haze removal algorithm using color attenuation prior
A fast single image haze removal algorithm using color attenuation prior
 
CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketaki
 
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
Final Poster
Final PosterFinal Poster
Final Poster
 
L0351007379
L0351007379L0351007379
L0351007379
 
Neural network based image compression with lifting scheme and rlc
Neural network based image compression with lifting scheme and rlcNeural network based image compression with lifting scheme and rlc
Neural network based image compression with lifting scheme and rlc
 
Image compression and reconstruction using a new approach by artificial neura...
Image compression and reconstruction using a new approach by artificial neura...Image compression and reconstruction using a new approach by artificial neura...
Image compression and reconstruction using a new approach by artificial neura...
 

Viewers also liked

Keynote at Holyoke Community College on OER and Open Pedagogy
Keynote at Holyoke Community College on OER and Open PedagogyKeynote at Holyoke Community College on OER and Open Pedagogy
Keynote at Holyoke Community College on OER and Open PedagogyRobin DeRosa
 
Sparsity Based Super Resolution Using Color Channel Constraints
Sparsity Based Super Resolution Using Color Channel ConstraintsSparsity Based Super Resolution Using Color Channel Constraints
Sparsity Based Super Resolution Using Color Channel ConstraintsHojjat Seyed Mousavi
 
Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)NamHyuk Ahn
 
Applying Deep Learning Vision Technology to low-cost/power Embedded Systems
Applying Deep Learning Vision Technology to low-cost/power Embedded SystemsApplying Deep Learning Vision Technology to low-cost/power Embedded Systems
Applying Deep Learning Vision Technology to low-cost/power Embedded SystemsJenny Midwinter
 
Super Resolution in Digital Image processing
Super Resolution in Digital Image processingSuper Resolution in Digital Image processing
Super Resolution in Digital Image processingRamrao Desai
 
Conditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN DecodersConditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN Decoderssuga93
 
Slideshare signup tutorial
Slideshare signup tutorialSlideshare signup tutorial
Slideshare signup tutorialbestabrook
 
Meta-Learning with Memory Augmented Neural Network
Meta-Learning with Memory Augmented Neural NetworkMeta-Learning with Memory Augmented Neural Network
Meta-Learning with Memory Augmented Neural NetworkYusuke Watanabe
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative ModelsKenta Oono
 
論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural NetworksSeiya Tokui
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 

Viewers also liked (12)

Keynote at Holyoke Community College on OER and Open Pedagogy
Keynote at Holyoke Community College on OER and Open PedagogyKeynote at Holyoke Community College on OER and Open Pedagogy
Keynote at Holyoke Community College on OER and Open Pedagogy
 
Sparsity Based Super Resolution Using Color Channel Constraints
Sparsity Based Super Resolution Using Color Channel ConstraintsSparsity Based Super Resolution Using Color Channel Constraints
Sparsity Based Super Resolution Using Color Channel Constraints
 
Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)
 
Applying Deep Learning Vision Technology to low-cost/power Embedded Systems
Applying Deep Learning Vision Technology to low-cost/power Embedded SystemsApplying Deep Learning Vision Technology to low-cost/power Embedded Systems
Applying Deep Learning Vision Technology to low-cost/power Embedded Systems
 
Super Resolution in Digital Image processing
Super Resolution in Digital Image processingSuper Resolution in Digital Image processing
Super Resolution in Digital Image processing
 
Conditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN DecodersConditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN Decoders
 
AR model
AR modelAR model
AR model
 
Slideshare signup tutorial
Slideshare signup tutorialSlideshare signup tutorial
Slideshare signup tutorial
 
Meta-Learning with Memory Augmented Neural Network
Meta-Learning with Memory Augmented Neural NetworkMeta-Learning with Memory Augmented Neural Network
Meta-Learning with Memory Augmented Neural Network
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
 
論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 

Similar to Pixel Recursive Super Resolution. Google Brain

Asilomar09 compressive superres
Asilomar09 compressive superresAsilomar09 compressive superres
Asilomar09 compressive superresHoàng Sơn
 
Effective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super ResolutionEffective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super ResolutionIOSR Journals
 
Effective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super ResolutionEffective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super ResolutionIOSR Journals
 
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDYSINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDYcsandit
 
Single Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional NetworksSingle Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional NetworksGreeshma M.S.R
 
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...sipij
 
Robust High Resolution Image from the Low Resolution Satellite Image
Robust High Resolution Image from the Low Resolution Satellite ImageRobust High Resolution Image from the Low Resolution Satellite Image
Robust High Resolution Image from the Low Resolution Satellite Imageidescitation
 
2018_Enhanced SRGAN.pdf
2018_Enhanced SRGAN.pdf2018_Enhanced SRGAN.pdf
2018_Enhanced SRGAN.pdfSekharSankuri1
 
Survey on Single image Super Resolution Techniques
Survey on Single image Super Resolution TechniquesSurvey on Single image Super Resolution Techniques
Survey on Single image Super Resolution TechniquesIOSR Journals
 
Survey on Single image Super Resolution Techniques
Survey on Single image Super Resolution TechniquesSurvey on Single image Super Resolution Techniques
Survey on Single image Super Resolution TechniquesIOSR Journals
 
mvitelli_ee367_final_report
mvitelli_ee367_final_reportmvitelli_ee367_final_report
mvitelli_ee367_final_reportMatt Vitelli
 
OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIG...
OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIG...OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIG...
OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIG...ijcsit
 
Super-Resolution of Multispectral Images
Super-Resolution of Multispectral ImagesSuper-Resolution of Multispectral Images
Super-Resolution of Multispectral Imagesijsrd.com
 
Decomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisDecomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisNaeem Shehzad
 
A Deep Journey into Super-resolution
A Deep Journey into Super-resolutionA Deep Journey into Super-resolution
A Deep Journey into Super-resolutionRonak Mehta
 
Learning Based Single Frame Image Super-resolution Using Fast Discrete Curvel...
Learning Based Single Frame Image Super-resolution Using Fast Discrete Curvel...Learning Based Single Frame Image Super-resolution Using Fast Discrete Curvel...
Learning Based Single Frame Image Super-resolution Using Fast Discrete Curvel...CSCJournals
 
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-ResolutionTaegyun Jeon
 

Similar to Pixel Recursive Super Resolution. Google Brain (20)

Asilomar09 compressive superres
Asilomar09 compressive superresAsilomar09 compressive superres
Asilomar09 compressive superres
 
Effective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super ResolutionEffective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super Resolution
 
Effective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super ResolutionEffective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super Resolution
 
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDYSINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
 
Single Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional NetworksSingle Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional Networks
 
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
ADVANCED SINGLE IMAGE RESOLUTION UPSURGING USING A GENERATIVE ADVERSARIAL NET...
 
Robust High Resolution Image from the Low Resolution Satellite Image
Robust High Resolution Image from the Low Resolution Satellite ImageRobust High Resolution Image from the Low Resolution Satellite Image
Robust High Resolution Image from the Low Resolution Satellite Image
 
B0141020
B0141020B0141020
B0141020
 
2018_Enhanced SRGAN.pdf
2018_Enhanced SRGAN.pdf2018_Enhanced SRGAN.pdf
2018_Enhanced SRGAN.pdf
 
Survey on Single image Super Resolution Techniques
Survey on Single image Super Resolution TechniquesSurvey on Single image Super Resolution Techniques
Survey on Single image Super Resolution Techniques
 
Survey on Single image Super Resolution Techniques
Survey on Single image Super Resolution TechniquesSurvey on Single image Super Resolution Techniques
Survey on Single image Super Resolution Techniques
 
mvitelli_ee367_final_report
mvitelli_ee367_final_reportmvitelli_ee367_final_report
mvitelli_ee367_final_report
 
Mnist report ppt
Mnist report pptMnist report ppt
Mnist report ppt
 
OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIG...
OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIG...OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIG...
OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIG...
 
Super-Resolution of Multispectral Images
Super-Resolution of Multispectral ImagesSuper-Resolution of Multispectral Images
Super-Resolution of Multispectral Images
 
Seminarpaper
SeminarpaperSeminarpaper
Seminarpaper
 
Decomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisDecomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesis
 
A Deep Journey into Super-resolution
A Deep Journey into Super-resolutionA Deep Journey into Super-resolution
A Deep Journey into Super-resolution
 
Learning Based Single Frame Image Super-resolution Using Fast Discrete Curvel...
Learning Based Single Frame Image Super-resolution Using Fast Discrete Curvel...Learning Based Single Frame Image Super-resolution Using Fast Discrete Curvel...
Learning Based Single Frame Image Super-resolution Using Fast Discrete Curvel...
 
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
 

More from eraser Juan José Calderón

Evaluación de t-MOOC universitario sobre competencias digitales docentes medi...
Evaluación de t-MOOC universitario sobre competencias digitales docentes medi...Evaluación de t-MOOC universitario sobre competencias digitales docentes medi...
Evaluación de t-MOOC universitario sobre competencias digitales docentes medi...eraser Juan José Calderón
 
Editorial of the JBBA Vol 4, Issue 1, May 2021. Naseem Naqvi,
Editorial of the JBBA Vol 4, Issue 1, May 2021. Naseem Naqvi, Editorial of the JBBA Vol 4, Issue 1, May 2021. Naseem Naqvi,
Editorial of the JBBA Vol 4, Issue 1, May 2021. Naseem Naqvi, eraser Juan José Calderón
 
REGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL LAYING DOWN HARMONIS...
REGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL LAYING DOWN HARMONIS...REGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL LAYING DOWN HARMONIS...
REGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL LAYING DOWN HARMONIS...eraser Juan José Calderón
 
Predicting Big Data Adoption in Companies With an Explanatory and Predictive ...
Predicting Big Data Adoption in Companies With an Explanatory and Predictive ...Predicting Big Data Adoption in Companies With an Explanatory and Predictive ...
Predicting Big Data Adoption in Companies With an Explanatory and Predictive ...eraser Juan José Calderón
 
Innovar con blockchain en las ciudades: Ideas para lograrlo, casos de uso y a...
Innovar con blockchain en las ciudades: Ideas para lograrlo, casos de uso y a...Innovar con blockchain en las ciudades: Ideas para lograrlo, casos de uso y a...
Innovar con blockchain en las ciudades: Ideas para lograrlo, casos de uso y a...eraser Juan José Calderón
 
Innovar con blockchain en las ciudades: Ideas para lograrlo, casos de uso y a...
Innovar con blockchain en las ciudades: Ideas para lograrlo, casos de uso y a...Innovar con blockchain en las ciudades: Ideas para lograrlo, casos de uso y a...
Innovar con blockchain en las ciudades: Ideas para lograrlo, casos de uso y a...eraser Juan José Calderón
 
Ética y Revolución Digital . revista Diecisiete nº 4. 2021
Ética y Revolución Digital . revista Diecisiete nº 4. 2021Ética y Revolución Digital . revista Diecisiete nº 4. 2021
Ética y Revolución Digital . revista Diecisiete nº 4. 2021eraser Juan José Calderón
 
#StopBigTechGoverningBigTech . More than 170 Civil Society Groups Worldwide O...
#StopBigTechGoverningBigTech . More than 170 Civil Society Groups Worldwide O...#StopBigTechGoverningBigTech . More than 170 Civil Society Groups Worldwide O...
#StopBigTechGoverningBigTech . More than 170 Civil Society Groups Worldwide O...eraser Juan José Calderón
 
PACTO POR LA CIENCIA Y LA INNOVACIÓN 8 de febrero de 2021
PACTO POR LA CIENCIA Y LA INNOVACIÓN 8 de febrero de 2021PACTO POR LA CIENCIA Y LA INNOVACIÓN 8 de febrero de 2021
PACTO POR LA CIENCIA Y LA INNOVACIÓN 8 de febrero de 2021eraser Juan José Calderón
 
Expert Panel of the European Blockchain Observatory and Forum
Expert Panel of the European Blockchain Observatory and ForumExpert Panel of the European Blockchain Observatory and Forum
Expert Panel of the European Blockchain Observatory and Forumeraser Juan José Calderón
 
Desigualdades educativas derivadas del COVID-19 desde una perspectiva feminis...
Desigualdades educativas derivadas del COVID-19 desde una perspectiva feminis...Desigualdades educativas derivadas del COVID-19 desde una perspectiva feminis...
Desigualdades educativas derivadas del COVID-19 desde una perspectiva feminis...eraser Juan José Calderón
 
"Experiencias booktuber: Más allá del libro y de la pantalla"
"Experiencias booktuber: Más allá del libro y de la pantalla""Experiencias booktuber: Más allá del libro y de la pantalla"
"Experiencias booktuber: Más allá del libro y de la pantalla"eraser Juan José Calderón
 
The impact of digital influencers on adolescent identity building.
The impact of digital influencers on adolescent identity building.The impact of digital influencers on adolescent identity building.
The impact of digital influencers on adolescent identity building.eraser Juan José Calderón
 
Open educational resources (OER) in the Spanish universities
Open educational resources (OER) in the Spanish universitiesOpen educational resources (OER) in the Spanish universities
Open educational resources (OER) in the Spanish universitieseraser Juan José Calderón
 
El modelo flipped classroom: un reto para una enseñanza centrada en el alumno
El modelo flipped classroom: un reto para una enseñanza centrada en el alumnoEl modelo flipped classroom: un reto para una enseñanza centrada en el alumno
El modelo flipped classroom: un reto para una enseñanza centrada en el alumnoeraser Juan José Calderón
 
Pensamiento propio e integración transdisciplinaria en la epistémica social. ...
Pensamiento propio e integración transdisciplinaria en la epistémica social. ...Pensamiento propio e integración transdisciplinaria en la epistémica social. ...
Pensamiento propio e integración transdisciplinaria en la epistémica social. ...eraser Juan José Calderón
 
Escuela de Robótica de Misiones. Un modelo de educación disruptiva.
Escuela de Robótica de Misiones. Un modelo de educación disruptiva.Escuela de Robótica de Misiones. Un modelo de educación disruptiva.
Escuela de Robótica de Misiones. Un modelo de educación disruptiva.eraser Juan José Calderón
 
La Universidad española Frente a la pandemia. Actuaciones de Crue Universidad...
La Universidad española Frente a la pandemia. Actuaciones de Crue Universidad...La Universidad española Frente a la pandemia. Actuaciones de Crue Universidad...
La Universidad española Frente a la pandemia. Actuaciones de Crue Universidad...eraser Juan José Calderón
 
Covid-19 and IoT: Some Perspectives on the Use of IoT Technologies in Prevent...
Covid-19 and IoT: Some Perspectives on the Use of IoT Technologies in Prevent...Covid-19 and IoT: Some Perspectives on the Use of IoT Technologies in Prevent...
Covid-19 and IoT: Some Perspectives on the Use of IoT Technologies in Prevent...eraser Juan José Calderón
 

More from eraser Juan José Calderón (20)

Evaluación de t-MOOC universitario sobre competencias digitales docentes medi...
Evaluación de t-MOOC universitario sobre competencias digitales docentes medi...Evaluación de t-MOOC universitario sobre competencias digitales docentes medi...
Evaluación de t-MOOC universitario sobre competencias digitales docentes medi...
 
Call for paper 71. Revista Comunicar
Call for paper 71. Revista ComunicarCall for paper 71. Revista Comunicar
Call for paper 71. Revista Comunicar
 
Editorial of the JBBA Vol 4, Issue 1, May 2021. Naseem Naqvi,
Editorial of the JBBA Vol 4, Issue 1, May 2021. Naseem Naqvi, Editorial of the JBBA Vol 4, Issue 1, May 2021. Naseem Naqvi,
Editorial of the JBBA Vol 4, Issue 1, May 2021. Naseem Naqvi,
 
REGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL LAYING DOWN HARMONIS...
REGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL LAYING DOWN HARMONIS...REGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL LAYING DOWN HARMONIS...
REGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL LAYING DOWN HARMONIS...
 
Predicting Big Data Adoption in Companies With an Explanatory and Predictive ...
Predicting Big Data Adoption in Companies With an Explanatory and Predictive ...Predicting Big Data Adoption in Companies With an Explanatory and Predictive ...
Predicting Big Data Adoption in Companies With an Explanatory and Predictive ...
 
Innovar con blockchain en las ciudades: Ideas para lograrlo, casos de uso y a...
Innovar con blockchain en las ciudades: Ideas para lograrlo, casos de uso y a...Innovar con blockchain en las ciudades: Ideas para lograrlo, casos de uso y a...
Innovar con blockchain en las ciudades: Ideas para lograrlo, casos de uso y a...
 
Innovar con blockchain en las ciudades: Ideas para lograrlo, casos de uso y a...
Innovar con blockchain en las ciudades: Ideas para lograrlo, casos de uso y a...Innovar con blockchain en las ciudades: Ideas para lograrlo, casos de uso y a...
Innovar con blockchain en las ciudades: Ideas para lograrlo, casos de uso y a...
 
Ética y Revolución Digital . revista Diecisiete nº 4. 2021
Ética y Revolución Digital . revista Diecisiete nº 4. 2021Ética y Revolución Digital . revista Diecisiete nº 4. 2021
Ética y Revolución Digital . revista Diecisiete nº 4. 2021
 
#StopBigTechGoverningBigTech . More than 170 Civil Society Groups Worldwide O...
#StopBigTechGoverningBigTech . More than 170 Civil Society Groups Worldwide O...#StopBigTechGoverningBigTech . More than 170 Civil Society Groups Worldwide O...
#StopBigTechGoverningBigTech . More than 170 Civil Society Groups Worldwide O...
 
PACTO POR LA CIENCIA Y LA INNOVACIÓN 8 de febrero de 2021
PACTO POR LA CIENCIA Y LA INNOVACIÓN 8 de febrero de 2021PACTO POR LA CIENCIA Y LA INNOVACIÓN 8 de febrero de 2021
PACTO POR LA CIENCIA Y LA INNOVACIÓN 8 de febrero de 2021
 
Expert Panel of the European Blockchain Observatory and Forum
Expert Panel of the European Blockchain Observatory and ForumExpert Panel of the European Blockchain Observatory and Forum
Expert Panel of the European Blockchain Observatory and Forum
 
Desigualdades educativas derivadas del COVID-19 desde una perspectiva feminis...
Desigualdades educativas derivadas del COVID-19 desde una perspectiva feminis...Desigualdades educativas derivadas del COVID-19 desde una perspectiva feminis...
Desigualdades educativas derivadas del COVID-19 desde una perspectiva feminis...
 
"Experiencias booktuber: Más allá del libro y de la pantalla"
"Experiencias booktuber: Más allá del libro y de la pantalla""Experiencias booktuber: Más allá del libro y de la pantalla"
"Experiencias booktuber: Más allá del libro y de la pantalla"
 
The impact of digital influencers on adolescent identity building.
The impact of digital influencers on adolescent identity building.The impact of digital influencers on adolescent identity building.
The impact of digital influencers on adolescent identity building.
 
Open educational resources (OER) in the Spanish universities
Open educational resources (OER) in the Spanish universitiesOpen educational resources (OER) in the Spanish universities
Open educational resources (OER) in the Spanish universities
 
El modelo flipped classroom: un reto para una enseñanza centrada en el alumno
El modelo flipped classroom: un reto para una enseñanza centrada en el alumnoEl modelo flipped classroom: un reto para una enseñanza centrada en el alumno
El modelo flipped classroom: un reto para una enseñanza centrada en el alumno
 
Pensamiento propio e integración transdisciplinaria en la epistémica social. ...
Pensamiento propio e integración transdisciplinaria en la epistémica social. ...Pensamiento propio e integración transdisciplinaria en la epistémica social. ...
Pensamiento propio e integración transdisciplinaria en la epistémica social. ...
 
Escuela de Robótica de Misiones. Un modelo de educación disruptiva.
Escuela de Robótica de Misiones. Un modelo de educación disruptiva.Escuela de Robótica de Misiones. Un modelo de educación disruptiva.
Escuela de Robótica de Misiones. Un modelo de educación disruptiva.
 
La Universidad española Frente a la pandemia. Actuaciones de Crue Universidad...
La Universidad española Frente a la pandemia. Actuaciones de Crue Universidad...La Universidad española Frente a la pandemia. Actuaciones de Crue Universidad...
La Universidad española Frente a la pandemia. Actuaciones de Crue Universidad...
 
Covid-19 and IoT: Some Perspectives on the Use of IoT Technologies in Prevent...
Covid-19 and IoT: Some Perspectives on the Use of IoT Technologies in Prevent...Covid-19 and IoT: Some Perspectives on the Use of IoT Technologies in Prevent...
Covid-19 and IoT: Some Perspectives on the Use of IoT Technologies in Prevent...
 

Recently uploaded

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 

Recently uploaded (20)

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 

Pixel Recursive Super Resolution. Google Brain

  • 1. Pixel Recursive Super Resolution Ryan Dahl ∗ Mohammad Norouzi Jonathon Shlens Google Brain {rld,mnorouzi,shlens}@google.com Abstract We present a pixel recursive super resolution model that synthesizes realistic details into images while enhancing their resolution. A low resolution image may correspond to multiple plausible high resolution images, thus modeling the super resolution process with a pixel independent con- ditional model often results in averaging different details– hence blurry edges. By contrast, our model is able to repre- sent a multimodal conditional distribution by properly mod- eling the statistical dependencies among the high resolution image pixels, conditioned on a low resolution input. We employ a PixelCNN architecture to define a strong prior over natural images and jointly optimize this prior with a deep conditioning convolutional network. Human evalua- tions indicate that samples from our proposed model look more photo realistic than a strong L2 regression baseline. 1. Introduction The problem of super resolution entails artificially en- larging a low resolution photograph to recover a plausi- ble high resolution version of it. When the zoom factor is large, the input image does not contain all of the infor- mation necessary to accurately construct a high resolution image. Thus, the problem is underspecified and many plau- sible high resolution images exist that match the low resolu- tion input image. This problem is significant for improving the state-of-the-art in super resolution, and more generally for building better conditional generative models of images. A super resolution model must account for the complex variations of objects, viewpoints, illumination, and occlu- sions, especially as the zoom factor increases. When some details do not exist in the source image, the challenge lies not only in ‘deblurring’ an image, but also in generating ∗Work done as a member of the Google Brain Residency program (g.co/brainresidency). 8×8 input 32×32 samples ground truth Figure 1: Illustration of our probabilistic pixel recursive super resolution model trained end-to-end on a dataset of celebrity faces. The left column shows 8×8 low resolution inputs from the test set. The middle and last columns show 32×32 images as predicted by our model vs. the ground truth. Our model incorporates strong face priors to synthe- size realistic hair and skin details. new image details that appear plausible to a human ob- server. Generating realistic high resolution images is not possible unless the model draws sharp edges and makes hard decisions about the type of textures, shapes, and pat- terns present at different parts of an image. Imagine a low resolution image of a face, e.g., the 8×8 images depicted in the left column of Figure 1–the details 1 arXiv:1702.00783v1[cs.CV]2Feb2017
  • 2. of the hair and the skin are missing. Such details cannot be faithfully recovered using simple interpolation techniques such as linear or bicubic. However, by incorporating the prior knowledge of the faces and their typical variations, an artist is able to paint believable details. In this paper, we show how a fully probabilistic model that is trained end-to- end can play the role of such an artist by synthesizing 32×32 face images depicted in the middle column of Figure 1. Our super resolution model comprises two components that are trained jointly: a conditioning network, and a prior net- work. The conditioning network effectively maps a low resolution image to a distribution over corresponding high resolution images, while the prior models high resolution details to make the outputs look more realistic. Our con- ditioning network consists of a deep stack of ResNet [10] blocks, while our prior network comprises a PixelCNN [28] architecture. We find that standard super resolution metrics such as peak signal-to-noise ratio (pSNR) and structural similar- ity (SSIM) fail to properly measure the quality of predic- tions for an underspecified super resolution task. These metrics prefer conservative blurry averages over more plau- sible photo realistic details, as new fine details often do not align exactly with the original details. Our evalua- tion studies demonstrate that humans easily distinguish real images from super resolution predictions when regression techniques are used, but they have a harder time telling our samples apart from real images. 2. Related work Super resolution has a long history in computer vi- sion [22]. Methods relying on interpolation [11] are easy to implement and widely used, however these methods suf- fer from a lack of expressivity since linear models cannot express complex dependencies between the inputs and out- puts. In practice, such methods often fail to adequately pre- dict high frequency details leading to blurry high resolution outputs. Enhancing linear methods with rich image priors such as sparsity [2] or Gaussian mixtures [35] have substantially improved the quality of the methods; likewise, leveraging low-level image statistics such as edge gradients improves predictions [31, 26, 6, 12, 25, 17]. Much work has been done on algorithms that search a database of patches and combine them to create plausible high frequency details in zoomed images [7, 13]. Recent patch-based work has fo- cused on improving basic interpolation methods by building a dictionary of pre-learned filters on images and selecting the appropriate patches by an efficient hashing mechanism [23]. Such dictionary methods have improved the inference speed while being comparable to state-of-the-art. Another approach for super resolution is to abandon in- ference speed requirements and focus on constructing the high resolution images at increasingly higher magnification factors. Convolutional neural networks (CNNs) represent an approach to the problem that avoids explicit dictionary construction, but rather implicitly extracts multiple layers of abstractions by learning layers of filter kernels. Dong et al. [5] employed a three layer CNN with MSE loss. Kim et al. [16] improved accuracy by increasing the depth to 20 layers and learning only the residuals between the high res- olution image and an interpolated low resolution image. Most recently, SRResNet [18] uses many ResNet blocks to achieve state of the art pSNR and SSIM on standard super resolution benchmarks–we employ a similar design for our conditional network and catchall regression baseline. Instead of using a per-pixel loss, Johnson et al.[14] use Euclidean distance between activations of a pre-trained CNN for model’s predictions vs. ground truth images. Us- ing this so-called preceptual loss, they train feed-forward networks for super resolution and style transfer. Bruna et al. [3] also use perceptual loss to train a super resolution network, but inference is done via gradient propagation to the low-res input (e.g., [9]). Ledig et al. [18] and Yu et al. [33] use GANs to cre- ate compelling super resolution results showing the ability of the model to predict plausible high frequency details. Sønderby et al. [15] also investigate GANs for super res- olution using a learned affine transformation that ensures the models only generate images that downscale back to the low resolution inputs. Sønderby et al. [15] also explore a masked autoregressive model like PixelCNN [27] but with- out the gated layers and using a mixture of gaussians in- stead of a multinomial distribution. Denton et al. [4] use a multi-scale adversarial network for image synthesis, but the architecture also seems beneficial for super resolution. PixelRNN and PixelCNN by Oord et al. [27, 28] are probabilistic generative models that impose an order on im- age pixels representing them as a long sequence. The proba- bility of each pixel is then conditioned on the previous ones. The gated PixelCNN obtained state of the art log-likelihood scores on CIFAR-10 and MNIST, making it one of the most competetive probabilistic generative models. Since PixelCNN uses log-likelihood for training, the model is highly penelized if negligible probability is as- signed to any of the training examples. By contrast, GANs only learn enough to fool a non-stationary discriminator. One of their common failure cases is mode collapsing were samples are not diverse enough [21]. Furthermore, GANs require careful tuning of hyperparameters to ensure the dis- criminator and generator are equally powerful and learn at equal rates. PixelCNNs are more robust to hyperparame- ter changes and usually have a nicely decaying loss curve. Thus, we adopt PixelCNN for super resolution applications.
  • 3. 3. Probabilistic super resolution We aim to learn a probabilistic super resolution model that discerns the statistical dependencies between a high resolution image and a corresponding low resolution im- age. Let x and y denote a low resolution and a high resolu- tion image, where y∗ represents a ground-truth high res- olution image. In order to learn a parametric model of pθ(y | x), we exploit a large dataset of pairs of low res- olution inputs and ground-truth high resolution outputs, de- noted D ≡ {(x(i) , y∗(i) )}N i=1. One can easily collect such a large dataset by starting from a set of high resolution im- ages and lowering their resolution as much as needed. To optimize the parameters θ of the conditional distribution p, we maximize a conditional log-likelihood objective defined as, O(θ | D) = (x,y∗)∈D log p(y∗ | x) . (1) The key problem discussed in this paper is the exact form of p(y | x) that enables efficient learning and inference, while generating realistic non-blurry outputs. We first dis- cuss pixel-independent models that assume that each out- put pixel is generated with an independent stochastic pro- cess given the input. We elaborate why these techniques result in sub-optimal blurry super resolution results. Finally we describe our pixel recursive super resolution model that generates output pixels one at a time to enable modeling the statistical dependencies between the output pixels using PixelCNN [27, 28], and synthesizes sharp images from very blurry input. 3.1. Pixel independent super resolution The simplest form of a probabilistic super resolution model assumes that the output pixels are conditionally inde- pendence given the inputs. As such, the conditional distri- bution of p(y | x) factorizes into a product of independent pixel predictions. Suppose an RGB output y has M pixels each with three color channels, i.e., y ∈ R3M . Then, log p(y | x) = 3M i=1 log p(yi | x) . (2) Two general forms of pixel prediction models have been ex- plored in the literature: Gaussian and multinomial distribu- tions to model continuous and discrete pixel values respec- tively. In the Gaussian case, log p(yi | x) = − 1 2δ2 yi − Ci(x) 2 2 − log √ 2δ2π , (3) where Ci(x) denotes the ith element of a non-linear trans- formation of x via a convolutional neural network. Ci(x) is the estimated mean for the ith output pixel yi, and σ2 denotes the variance. Often the variance is not learned, in How the dataset was created 50%/50% Model output Cross-entropy L2 Regression PixelCNN Figure 2: Top: A cartoon of how the input and output pairs for the toy dataset were created. Bottom: Example pre- dictions for various algorithms trained on this dataset. The pixel independent L2 regression and cross-entropy models do not exhibit multimodal predictions. The PixelCNN out- put is stochastic and multiple samples will place a digit in either corner 50% of the time. which case maximizing the conditional log-likelihood of (1) reduces to minimizing the mean squared error (MSE) be- tween yi and Ci(x) across the pixels and channels through- out the dataset. Super resolution models based on MSE re- gression (e.g., [5, 16, 18]) fall within this family of pixel independent models, where the outputs of a neural network parameterize a set of Gaussians with fixed bandwidth. Alternatively, one could use a flexible multinomial dis- tribution as the pixel prediction model, in which case the ouput dimensions are discretized into K possible values (e.g., K = 256) where yi ∈ {1, . . . , K}. The pixel pre- diction model based on a multinomial softmax operator is represented as, log p(yi = k | x) = wjk T Ci(x)−log K v=1 exp{wjv T Ci(x)} , (4) where {wjk}3,K j=1,k=1 denote the softmax weights for differ- ent color channels and different discrete values. 3.2. Synthetic multimodal task To demonstrate how the above pixel independent models can fail at conditional image modeling, we created a syn- thetic dataset that is explicitly multimodal. For many gen- erative tasks like super resolution, colorization, and depth estimation, models that are able to predict a mode without averaging effects are desirable. For example, in coloriza-
  • 4. tion, selecting a strong red or blue for a car is better than selecting a sepia toned average of all of the colors of cars that the model has been exposed to. In this synthetic task, the input is an MNIST digit (1st row of Figure 2), and the output is the same input digit but scaled and translated ei- ther into the upper left corner or upper right corner (2nd and 3rd rows of Figure 2). The dataset has an equal ratio of up- per left and upper right outputs, which we call the MNIST corners dataset. A convolutional network using per pixel squared error loss (Figure 2, L2 Regression) produces two blurry fig- ures. Replacing the continuous loss with a per-pixel cross- entropy produces crisper images but also fails to capture the stochastic bimodality because both digits are shown in both corners (Figure 2, cross-entropy). In contrast, a model that explicitly deals with multi-modality, PixelCNN stochasti- cally predicts a digit in the upper-left or bottom-right cor- ners but never predicts both digits simultaneously (Figure 2, PixelCNN). See Figure 5 for examples of our super resolution model predicting different modes on a realistic dataset. Any good generative model should be able to make sharp single mode predictions and a dataset like this would be a good starting point for any new models. 4. Pixel recursive super resolution The main issue with the previous probabilistic models (Equations (3) and (4)) for super resolution is the lack of conditional dependency between super resolution pixels. There are two general methods to model statistical correla- tions between output pixels. One approach is to define the conditional distribution of the output pixels jointly by ei- ther a multivariate Gaussian mixture [36] or an undirected graphical model such as conditional random fields [8]. With these approaches one has to commit to a particular form of statistical dependency between the output pixels, for which inference can be computationally expensive. The second approach that we follow in this work, is to factorize the con- ditional distribution using chain rule as, log p(y | x) = M i=1 log p(yi | x, y<i) , (5) where the generation of each output dimension is condi- tioned on the input, previous output pixels, and the previous channels of the same output pixel. For simplicity of exposi- tion, we ignore different output channels in our derivations, and use y<i to represent {y1, . . . , yi−1}. The benefits of this approach is that the exact form of the conditional de- pendencies is flexible and the inference is straightforward. Inspired by the PixelCNN model, we use a multinomial dis- tribution to model discrete pixel values in Eq. (5). Alter- natively, one could use an autoregressive prediction model conditioning! network (CNN)! prior network! (PixelCNN) + logits HR! image HR! image Figure 3: The proposed super resolution network com- prises a conditioning network and a prior network. The conditioning network is a CNN that receives a low reso- lution image as input and outputs logits predicting the con- ditional log-probability of each high resolution (HR) image pixel. The prior network, a PixelCNN [28], makes predic- tions based on previous stochastic predictions (indicated by dashed line). The model’s probability distribution is com- puted as a softmax operator on top of the sum of the two sets of logits from the prior and conditioning networks. with Gaussian or Logistic (mixture) conditionals as pro- posed in [24]. Our model, outlined in Figure 3, comprises two ma- jor components that are fused together at a late stage and trained jointly: (1) a conditioning network (2) a prior net- work. The conditioning network is a pixel independent pre- diction model that maps a low resolution image to a proba- bilistic skeleton of a high resolution image, while the prior network is supposed to add natural high resolution details to make the outputs look more realistic. Given an input x ∈ RL , let Ai(x) : RL → RK denote a conditioning network predicting a vector of logit values corresponding to the K possible values that the ith output pixel can take. Similarly, let Bi(y<i) : Ri−1 → RK denote a prior network predicting a vector of logit values for the ith output pixel. Our probabilistic model predicts a distribution over the ith output pixel by simply adding the two sets of logits and applying a softmax operator on them, p(yi | x, y<i) = softmax(Ai(x) + Bi(y<i)) . (6) To optimize the parameters of A and B jointly, we per- form stochastic gradient ascent to maximize the conditional log likelihood in (1). That is, we optimize a cross-entropy loss between the model’s predictions in (6) and discrete
  • 5. ground truth labels y∗ i ∈ {1, . . . , K}, O1 = (x,y∗)∈D M i=1 1[y∗ i ] T (Ai(x) + Bi(y∗ <i)) − lse(Ai(x) + Bi(y∗ <i)) , (7) where lse(·) is the log-sum-exp operator corresponding to the log of the denominator of a softmax, and 1[k] denotes a K-dimensional one-hot indicator vector with its kth dimen- sion set to 1. Our preliminery experiments indicate that models trained with (7) tend to ignore the conditioning network as the statistical correlation between a pixel and previous high resolution pixels is stronger than its correlation with low resolution inputs. To mitigate this issue, we include an additional loss in our objective to enforce the condition- ing network to be optimized. This additional loss measures the cross-entropy between the conditioning network’s pre- dictions via softmax(Ai(x)) and ground truth labels. The total loss that is optimized in our experiments is a sum of two cross-entropy losses formulated as, O2 = (x,y∗)∈D M i=1 1[y∗ i ] T (2 Ai(x) + Bi(y∗ <i)) − lse(Ai(x) + Bi(y∗ <i)) − lse(Ai(x)) . (8) Once the network is trained, sampling from the model is straightforward. Using (6), starting at i = 1, first we sample a high resolution pixel. Then, we proceed pixel by pixel, feeding in the previously sampled pixel values back into the network, and draw new high resolution pixels. The three channels of each pixel are generated sequentially in turn. We additionally consider greedy decoding, where one al- ways selects the pixel value with the largest probability and sampling from a tempered softmax, where the concentra- tion of a distribution p is adjusted by using a temperature parameter τ > 0, pτ = pτ pτ 1 . To control the concentration of our sampling distribution p(yi | x, y<i), it suffices to multiply the logits from A and B by a parameter τ. Note that as τ goes towards ∞, the dis- tribution converges to the mode1 , and sampling converges to greedy decoding. 4.1. Implementation details The conditioning network is a feed-forward convolu- tional neural network that takes an 8×8 RGB image through 1We use a non-standard notion of temperature that represents 1 τ in the standard notation. a series of ResNet [10] blocks and transpose convolution layers while maintaining 32 channels throughout. The last layer uses a 1×1 convolution to increase the channels to 256×3 and uses the resulting activations to predict a multi- nomial distribution over 256 possible sub-pixel values via a softmax operator. This network provides the ability to absorb the global structure of the image in the marginal probability distribu- tion of the pixels. Due to the softmax layer it can capture the rich intricacies of the high resolution distribution, but we have no way to coherently sample from it. Sampling sub-pixels independently will mix the assortment of distri- butions. The prior network provides a way to tie together the sub- pixel distributions and allow us to take samples dependent on each other. We use 20 gated PixelCNN layers with 32 channels at each layer. We leave conditioning until the late stages of the network, where we add the pre-softmax ac- tivations from the conditioning network and prior network before computing the final joint softmax distribution. Our model is built by using TensorFlow [1] and trained across 8 GPUs with synchronous SGD updates. See ap- pendix A for further details. 5. Experiments We assess the effectiveness of the proposed pixel re- cursive super resolution method on two datasets contain- ing small faces and bedroom images. The first dataset is a version of the CelebA dataset [19] composed of a set of celebrity faces, which are cropped around the face. In the second dataset LSUN Bedrooms [32], images are center cropped. In both datasets we resize the images to 32×32 with bicubic interpolation and again to 8×8, constituting the output and input pairs for training and evaluation. We present representative super resolution examples on held out test sets and report human evaluations of our predictions in Table 1. We compare our results with two baselines: a pixel inde- pendent L2 regression (“Regression”) and a nearest neigh- bors search (“NN“). The architecture used for the regres- sion baseline is identical to the conditioning network used in our model, consisting of several ResNet blocks and upsam- pling convolutional layers, except that the baseline regres- sion model outputs three channels and has a final tanh(·) non-linearity instead of ReLU. The regression architecture is similar in design to to SRResNet [18], which reports state of the art scores in image similarity metrics. Furthermore, we train the regression network to predict super resolution residuals instead of the actual pixel values. The residu- als are computed based on bicubic interpolation of the in- put, and are known to work better to provide superior pre- dictions [16]. The nearest neighbors baseline searches the downsampled training set for the nearest example (using eu-
  • 6. Input Regression Ours G. Truth NN Figure 4: Samples from the model trained on LSUN Bed- rooms at 32 × 32. clidean distance) and returns its high resolution counterpart. 5.1. Sampling Sampling from the model multiple times results in dif- ferent high resolution images for a given low resolution im- age (Figure 5). A given model will identify many plausi- ble high resolution images that correspond to a given lower resolution image. Each one of these samples may contain distinct qualitative features and each of these modes is con- tained within the PixelCNN. Note that the differences be- tween samples for the faces dataset are far less drastic than seen in our synthetic dataset, where failure to cleanly pre- dict modes meant complete failure. The quality of samples is sensitive to the softmax tem- perature. When the mode is sampled (τ = ∞) at each sub- pixel, the samples are of poor quality, they look smooth with horizontal and vertical line artifacts. Sampling at τ = 1.0, the exact probability given by the model, tend to be more jittery with high frequency content. It seems in this case there are multiple less certain trajectories and the samples Figure 5: Left: low-res input. Right: Diversity of super resolution samples at τ = 1.2. jump back and forth between them–perhaps this is allevi- ated with more capacity and training time. Manually tuning the softmax temperature was necessary to find good looking samples–usually a value between 1.1 and 1.3 worked. In Figure 6 are various test predictions with their nega- tive log probability scores listed below each image. Smaller scores means the model has assigned that image a larger probability mass. The greedy, bicubic, and regression faces are preferred by the model despite being worse quality. This is probably because their smooth face-like structure doesn’t contradict the learned distributions. Yet sampling with the proper softmax temperature nevertheless finds re- alistic looking images.
  • 7. Ground Truth NN Bicubic Regression Greedy τ = 1.0 τ = 1.1 τ = 1.2 2.85 2.74 1.76 2.34 1.82 2.94 2.79 2.69 2.96 2.71 1.82 2.17 1.77 3.18 3.09 2.95 2.76 2.63 1.80 2.35 1.64 2.99 2.90 2.64 Figure 6: Our model does not produce calibrated log-probabilities for the samples. Negative log-probabilities are reported below each image. Note that the best log-probabilities arise from bicubic interpolation and greedy sampling even though the images are poor quality. 5.2. Image similarity Many methods exist for quantifying image similarity that attempt to measure human perception judgements of simi- larity [29, 30, 20]. We quantified the prediction accuracy of our model compared to ground truth using pSNR and MS-SSIM (Table 1). We found that our own visual assess- ment of the predicted image quality did not correspond to these image similarities metrics. For instance, bicubic in- terpolation achieved relatively high metrics even though the samples appeared quite poor. This result matches recent ob- servations that suggest that pSNR and SSIM provide poor judgements of super resolution quality [18, 14] when new details are synthesized. To ensure that samples do indeed correspond to the low- resolution input, we measured how consistent the high res- olution output image is with the low resolution input image (Table 1, ’consistency’). Specifically, we measured the L2 distance between the low-resolution input image and a bicu- bic downsampled version of the high resolution estimate. Lower L2 distances correspond to high resolutions that are more similar to the original low resolution image. Note that the nearest neighbor high resolution images are less consis- tent even though we used a database of 3 million training images to search for neighbors in the case of LSUN bed- rooms. In contrast, the bicubic resampling and the Pixel- CNN upsampling methods showed consistently better con- sistency with the low resolution image. This indicates that our samples do indeed correspond to the low-resolution in- put. 5.3. Human study We presented crowd sourced workers with two images: a true image and the corresponding prediction from our vari- ous models. Workers were asked “Which image, would you guess, is from a camera?” Following the setup in Zhang et al [34], we present each image for one second at a time before allowing them to answer. Workers are started with 10 practice pairs during which they get feedback if they choose correctly or not. The practice pairs not counted in the results. After the practice pairs, they are shown 45 ad- ditional pairs, 5 of which are golden questions designed to test if the person is paying attention. The golden question pits a bicubicly upsampled image (very blurry) against the ground truth. Excluding the golden and practice questions, we count fourty answers per session. Sessions in which they missed any golden questions are thrown out. Workers were only allowed to participate in any of our studies once. We continued running sessions until fourty different different workers were tested on each of the four algorithms. We report in Table 1 the percent of the time users choose an algorithm’s output over the ground truth counterpart. Note that 50% would say that an algorithm perfectly con- fused the subjects.
  • 8. Algorithm pSNR SSIM MS-SSIM Consistency % Fooled Bicubic 28.92 0.84 0.76 0.006 – NN 28.18 0.73 0.66 0.024 – Regression 29.16 0.90 0.90 0.004 4.0 ± 0.2 τ = 1.0 29.09 0.84 0.86 0.008 11.0 ± 0.1 τ = 1.1 29.08 0.84 0.85 0.008 10.4 ± 0.2 τ = 1.2 29.08 0.84 0.86 0.008 10.2 ± 0.1 Bicubic 28.94 0.70 0.70 0.002 – NN 28.15 0.49 0.45 0.040 – Regression 28.87 0.74 0.75 0.003 2.1 ± 0.1 τ = 1.0 28.92 0.58 0.60 0.016 17.7 ± 0.4 τ = 1.1 28.92 0.59 0.59 0.017 22.4 ± 0.3 τ = 1.2 28.93 0.59 0.58 0.018 27.9 ± 0.3 Table 1: Top: Results on the cropped CelebA test dataset at 32×32 magnified from 8×8. Bottom: LSUN bedrooms. pSNR, SSIM, and MS-SSIM measure image similarity between samples and the ground truth. Consistency lists the MSE between the input low-res image and downsampled samples on a [0, 1] scale. % Fooled reports how often the algorithms samples fooled a human in a crowd sourced study; 50% would be perfectly confused. 6. Conclusion As in many image transformation tasks, the central prob- lem of super resolution is in hallucinating sharp details by choosing a mode of the output distribution. We explored this underspecified problem using small images, demon- strating that even the smallest 8×8 images can be enlarged to sharp 32×32 images. We introduced a toy dataset with a small number of explicit modes to demonstrate the failure cases of two common pixel independent likelihood models. In the presented model, the conditioning network gets us most of the way towards predicting a high-resolution im- age, but the outputs are blurry where the model is uncer- tain. Combining the conditioning network with a PixelCNN model provides a strong prior over the output pixels, allow- ing the model to generate crisp predications. Our human evaluations indicate that samples from our model on aver- age look more photo realistic than a strong regression based conditioning network alone. References [1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghe- mawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Man´e, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Vi´egas, O. Vinyals, P. War- den, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. Tensor- Flow: Large-scale machine learning on heterogeneous sys- tems, 2015. Software available from tensorflow.org. 5 [2] M. Aharon, M. Elad, and A. Bruckstein. Svdd: An algorithm for designing overcomplete dictionaries for sparse represen- tation. Trans. Sig. Proc., 54(11):4311–4322, Nov. 2006. 2 [3] J. Bruna, P. Sprechmann, and Y. LeCun. Super-resolution with deep convolutional sufficient statistics. CoRR, abs/1511.05666, 2015. 2 [4] E. L. Denton, S. Chintala, A. Szlam, and R. Fergus. Deep generative image models using a laplacian pyramid of adver- sarial networks. NIPS, 2015. 2 [5] C. Dong, C. C. Loy, K. He, and X. Tang. Image super- resolution using deep convolutional networks. CoRR, abs/1501.00092, 2015. 2, 3 [6] R. Fattal. Image upsampling via imposed edge statistics. ACM Trans. Graph., 26(3), July 2007. 2 [7] W. T. Freeman, T. R. Jones, and E. C. Pasztor. Example- based super-resolution. IEEE Computer graphics and Appli- cations, 2002. 2 [8] W. T. Freeman and E. C. Pasztor. Markov networks for super- resolution. In CISS, 2000. 4 [9] L. A. Gatys, A. S. Ecker, and M. Bethge. A neural algorithm of artistic style. CoRR, abs/1508.06576, 2015. 2 [10] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CVPR, 2015. 2, 5 [11] H. Hou and H. Andrews. Cubic splines for image interpola- tion and digital filtering. Acoustics, Speech and Signal Pro- cessing, IEEE Transactions on, 26(6):508–517, Jan. 2003. 2 [12] J. Huang and D. Mumford. Statistics of natural images and models. In Computer Vision and Pattern Recognition, 1999. IEEE Computer Society Conference on., volume 1. IEEE, 1999. 2 [13] J.-B. Huang, A. Singh, and N. Ahuja. Single image super- resolution from transformed self-exemplars. In IEEE Con- ference on Computer Vision and Pattern Recognition), 2015. 2 [14] J. Johnson, A. Alahi, and F. Li. Perceptual losses for real-time style transfer and super-resolution. CoRR, abs/1603.08155, 2016. 2, 7
  • 9. Ours Ground Truth Ours Ground Truth 23/40 = 57% 34/40 = 85% 17/40 = 42% 30/40 = 75% 16/40 = 40% 26/40 = 65% 1/40 = 2% 3/40 = 7% 1/40 = 2% 3/40 = 7% 1/40 = 2% 4/40 = 1% Figure 7: The best and worst rated images in the human study. The fractions below the images denote how many times a person choose that image over the ground truth. See the supplementary material for more images used in the study. [15] C. Kaae Sønderby, J. Caballero, L. Theis, W. Shi, and F. Husz´ar. Amortised MAP Inference for Image Super- resolution. ArXiv e-prints, Oct. 2016. 2 [16] J. Kim, J. K. Lee, and K. M. Lee. Accurate image super- resolution using very deep convolutional networks. CoRR, abs/1511.04587, 2015. 2, 3, 5 [17] K. I. Kim and Y. Kwon. Single-image super-resolution using sparse regression and natural image prior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(6):1127– 1133, 2010. 2 [18] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Aitken, A. Te- jani, J. Totz, Z. Wang, and W. Shi. Photo-realistic single im- age super-resolution using a generative adversarial network. arXiv:1609.04802, 2016. 2, 3, 5, 7 [19] Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face attributes in the wild. In Proceedings of International Con- ference on Computer Vision (ICCV), 2015. 5 [20] K. Ma, Q. Wu, Z. Wang, Z. Duanmu, H. Yong, H. Li, and L. Zhang. Group mad competition - a new methodology to compare objective image quality models. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016. 6 [21] L. Metz, B. Poole, D. Pfau, and J. Sohl-Dickstein. Unrolled generative adversarial networks. CoRR, abs/1611.02163, 2016. 2 [22] K. Nasrollahi and T. B. Moeslund. Super-resolution: A com- prehensive survey. Mach. Vision Appl., 25(6):1423–1468, Aug. 2014. 2 [23] Y. Romano, J. Isidoro, and P. Milanfar. RAISR: rapid and ac- curate image super resolution. CoRR, abs/1606.01299, 2016. 2 [24] T. Salimans, A. Karpathy, X. Chen, D. P. Kingma, and Y. Bu- latov. Pixelcnn++: A pixelcnn implementation with dis- cretized logistic mixture likelihood and other modifications. under review at ICLR 2017. 4 [25] Q. Shan, Z. Li, J. Jia, and C.-K. Tang. Fast image/video upsampling. ACM Transactions on Graphics (TOG), 27(5):153, 2008. 2 [26] J. Sun, Z. Xu, and H.-Y. Shum. Image super-resolution us- ing gradient profile prior. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8. IEEE, 2008. 2 [27] A. van den Oord, N. Kalchbrenner, and K. Kavukcuoglu. Pixel recurrent neural networks. ICML, 2016. 2, 3 [28] A. van den Oord, N. Kalchbrenner, O. Vinyals, L. Espeholt, A. Graves, and K. Kavukcuoglu. Conditional image genera- tion with pixelcnn decoders. NIPS, 2016. 2, 3, 4 [29] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simon- celli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image process- ing, 13(4):600–612, 2004. 6 [30] Z. Wang, E. P. Simoncelli, and A. C. Bovik. Multiscale structural similarity for image quality assessment. In Sig- nals, Systems and Computers, 2004. Conference Record of the Thirty-Seventh Asilomar Conference on, volume 2, pages 1398–1402. Ieee, 2004. 6 [31] C. Y. Yang, S. Liu, and M. H. Yang. Structured face hallu- cination. In 2013 IEEE Conference on Computer Vision and Pattern Recognition, pages 1099–1106, June 2013. 2 [32] F. Yu, Y. Zhang, S. Song, A. Seff, and J. Xiao. Lsun: Con- struction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015. 5 [33] X. Yu and F. Porikli. Ultra-Resolving Face Images by Dis- criminative Generative Networks, pages 318–333. Springer International Publishing, Cham, 2016. 2 [34] R. Zhang, P. Isola, and A. A. Efros. Colorful image coloriza- tion. ECCV, 2016. 7
  • 10. [35] D. Zoran and Y. Weiss. From learning models of natural image patches to whole image restoration. In Proceedings of the 2011 International Conference on Computer Vision, ICCV ’11, pages 479–486, Washington, DC, USA, 2011. IEEE Computer Society. 2 [36] D. Zoran and Y. Weiss. From learning models of natural image patches to whole image restoration. In CVPR, 2011. 4
  • 11. A. Operation Kernel Strides Feature maps Conditional network – 8 × 8 × 3 input B × ResNet block 3 × 3 1 32 Transposed Convolution 3 × 3 2 32 B × ResNet block 3 × 3 1 32 Transposed Convolution 3 × 3 2 32 B × ResNet block 3 × 3 1 32 Convolution 1 × 1 1 3 ∗ 256 PixelCNN network – 32 × 32 × 3 input Masked Convolution 7 × 7 1 64 20 × Gated Convolution Layers 5 × 5 1 64 Masked Convolution 1 × 1 1 1024 Masked Convolution 1 × 1 1 3 ∗ 256 Optimizer RMSProp (decay=0.95, momentum=0.9, epsilon=1e-8) Batch size 32 Iterations 2,000,000 for Bedrooms, 200,000 for faces. Learning Rate 0.0004 and divide by 2 every 500000 steps. Weight, bias initialization truncated normal (stddev=0.1), Constant(0) Table 2: Hyperparameters used for both datasets. For LSUN bedrooms B = 10 and for the cropped CelebA faces B = 6.
  • 12. B. LSUN bedrooms samples Input Bicubic Regression τ = 1.0 τ = 1.1 τ = 1.2 Truth NN
  • 13. Input Bicubic Regression τ = 1.0 τ = 1.1 τ = 1.2 Truth NN
  • 14. Input Bicubic Regression τ = 1.0 τ = 1.1 τ = 1.2 Truth NN
  • 15. Input Bicubic Regression τ = 1.0 τ = 1.1 τ = 1.2 Truth NN
  • 16. Input Bicubic Regression τ = 1.0 τ = 1.1 τ = 1.2 Truth NN
  • 17. C. Cropped CelebA faces Input Bicubic Regression τ = 1.0 τ = 1.1 τ = 1.2 Truth NN
  • 18. Input Bicubic Regression τ = 1.0 τ = 1.1 τ = 1.2 Truth NN
  • 19. Input Bicubic Regression τ = 1.0 τ = 1.1 τ = 1.2 Truth NN
  • 20. Input Bicubic Regression τ = 1.0 τ = 1.1 τ = 1.2 Truth NN
  • 21. Input Bicubic Regression τ = 1.0 τ = 1.1 τ = 1.2 Truth NN