Implementing Neural Style Transfer

Implementing Neural Style Transfer
Authors : Tasmiah Tahsin Mayeesha,Ahraf
Sharif , Hashmir Rahsan Toron
Electrical and Computer Engineering Department,
North South University
Abstract— This technical report implements the
recent neural style transfer method invented by Gatys
et.al in the paper “A Neural Algorithm of Artistic
Style Transfer” and compares different optimization
techniques and variety of data.
Keywords—machine learning, deep learning,
convolutional neural networks, neural style transfer,
computer vision
I. INTRODUCTION
“Imitation is the greatest form of
flattery”—said Charles Caleb Canton, a 17th
Century English cleric and writer. Artists
has always build on past works made by the
former artists to push the frontier of human
imagination ahead. Art has been used for
depicting religious figures, spreading
political propaganda, inspiring protests,
subtly communicating humor with cartoons
to preserving history via portraits. But until
now, art has always been created by
humans for consumption by other humans.
A significant difference between humans
and machines is that only humans can
imagine and create new art in the form of
paintings, books and songs.
However, machine learning has advanced
so far to the point of being able to create art
by themselves. Deep neural networks can
combine two or more images in such ways
that we can split the style and the content of
images and take the style of one image to
impose on the other to create a completely
new image that takes inspiration from the
style and context of the input images. This
is the key idea of neural style transfer by
Gatys Et. Al that uses convolutional neural
networks for this task. This paper will
describe our implementation of the style
transfer technique within the computational
constraints we faced.
II. RELATED WORK
Transferring the style from one image to
another image is an interesting yet difficult
problem. There have been many efforts to
develop efficient methods for automatic
style transfer [Hertzmann et al., 2001;
Efros and Freeman, 2001 Recently, Gatys
et al. proposed a seminal work [Gatys et al.,
2016]: It captures the style of artistic
images and transfer it to other images using
Convolutional Neural Networks (CNN).
This work formulated the problem as
finding an image that matching both the
content and style statistics based on the

neural activations of each layer in CNN.
Later more improvements have been
disclosed in papers like “Perceptual Losses
for Style Transfer and Super Resolution”
by Johnson(2016)
III. METHODOLOGY
Mathematically, Given a content image c
and a style image s we want to generate
an output image x that has the
texture,color etc from s and content from
c. According to Gatys et.al we can pose it
as an optimization problem where
X*=argminx(αLcontent(c,x)+βLstyle(s,x))
Here α = weight of content loss and β =
weight of style loss. We want to find out the
output image x that minimizes the loss or
differs as little as possible from c in content
and s in style.
A. Algorithms and Techniques
To generate the output image with neural
style transfer technique another method
called transfer learning is used. Transfer
learning refers to using the weights
pretrained network(on imagenet dataset) to
do some other task that the network was
not originally trained for. For example, in
imagenet challenge the imposed task is a
classification problem for 1000 classes, but
it’s possible to take the weights from this
network and use them to a binary
classification problem by replacing the
final softmax layer.
When a convolutional neural network is
trained for a classification task, the
convolutional layers tend to learn the
feature representations for those images.
The higher level convolutional layers learn
the general high level features(textures,
color etc) and the arrangement of the input
images, but they do not learn the exact
pixel values, on the contrary the lower level
layers learn the general content of the
image as we progressively go deeper in the
network. Thus in a sense the style and the
content of the image is separable.
In this case, we use a CNN called VGG16
released by Oxford’s Visual Geometry
group in 2016. We use this network to get
the feature representations of the images
and use them to define the loss score and
gradients to update a randomly generated
image and minimize the loss. The
architecture of the network is shown in the
following diagram :

B. DATA PREPROCESSING
Before proceeding with using the VGG-
16 network on our images to extract feature
representations, we need to preprocess them
like the original paper.
For this, following transforms were
applied:
1. Subtraction of the mean RGB value
(computed previously on the imagenet
training set) from each pixel.
2. Flipping the ordering of the multi-
dimensional array from RGB to BGR (the
ordering used in the paper).
For memory related constraints we’ve
also resized the images to 224 x 224 as
bigger images mean more parameters to
tune. With a 224 x 224 image with 3
channels(R,G,B) , the combined image has
224 x 224 x 3 = 150528 parameters to tune
already.
C. Loss Function
Loss or cost function in machine learning is
used for scoring algorithms by comparing
generated output with the expected output.
For neural style transfer the output is an
image that contains both the style of the
style image and the content of the content
image as much as possible.
The loss function outputs the score that
indicates how close the generated image is
to the original style image in style and
content image for content. Unlike image
classification, where the loss function is
used for updating the weights of the
network after comparing the predictions
with the original classes, score of loss
function for the neural style transfer is used
for updating the pixels of the generated
image with stochastic gradient descent or
other optimizers.
Since the loss function has to measure both
the style loss and the content loss, we can
write the loss function following way.
Loss = αLcontent(c,x)+βLstyle(s,x)
Content Loss is the Mean Squared Error
between the feature representations of the
content image and the combined image.
Style loss is the scaled, squared loss of the
frobenius norm of the difference between the
Gram matrices of the style and combination
images. Gram Matrices refer the matrix

formed by multiplying the transpose of a
matrix with itself.
In order to denoise the result images we also
add ‘Total Variation Loss’ to the images to
reduce shakiness introduced in the paper
“Understanding Deep Images by Inverting
them” by Aravindh(2014). Thus the loss
function is the summation of these three
terms.
IV. MODEL TRAINING AND
EVALUATION
A.Model Training
The combination image is initialized as a
random collection of pixels. Using the L-
BFGS algorithm (a quasi-Newton
algorithm that's significantly quicker to
converge than standard gradient descent) to
iteratively improve upon it.
For training the model we pass the content
image, style image and the combined
images through the VGG-16 network to
extract features to measure the loss
functions. In each iteration we measure the
loss and update the combination image
accordingly. Each iteration took around 5
minutes on a 4GB RAM machine, but we
expect a significant speed up using GPUs.
A. Model Evaluation
The training loss for the generated image
was measured with the loss function as
described above with L-BFGS optimizer.
Table for Training Loss for each epoch :
Epoch Loss(1e^10) Time
1 5.849 284
2 2.633 312
3 2.032 275
4 1.801 279
5 1.61 272
Graph of Loss Function
As we were able to derive photos with good
resolution after only 5 epochs we have not
increased the number of training iterations.
After using a content image for Jatio Songsod
Bhaban of Bangladesh and a style image of a
impressionist painting “Forest” drawn by Artist
Leonardo Afremov, we were able to generate this
output :

We can also experiment with different
content, style and variation loss weights to
see how the outputs differ. This image of
Savar has following parameters : α = 0.025,
β = 5, γ = 1
The style weight is larger than the content
weight so the output image looks a lot like
Van Gogh’s starry night. But if we change
the parameters to α = 4, β = 2, γ = 1, the
output image also changes a lot. As the
content weight > style weight the output
image looks more like the original image
except with some filters.
V.IMPROVEMENTS AND DEPLOYMENT
A. Improvement
1. Variation of parameters
We will try different variations of the
parameters by changing the input images,
their sizes, the weights of the different loss
functions, the features used to construct
them to compare the results given our
computational constraints. As the memory
of my laptop is quite small(only 4GB) so
far we've been unable to use the algorithm
on anything above the 224x224 image size.
Deep learning algorithms are meant to run
on the GPU which we do not have so far.
2. Speed Optimization
As the current process is very slow, we're
going to replace our current implementation
with an image transformation CNN network
and implement fast style transfer method as

described in Perceptual Loss paper in
Johnson(2016).This will give us a 1000x
speed up over this implementation, making
it suitable for a webapp.
B.Deployment
The preferred outcome of this project
would have been a deployed application
implemented in python that would help us
to create new images with neural style
transfer in real time styled with traditional
Bangladeshi paintings.
However, because of the complexity of
the technique so far we’ve been able to
implement the backend only with the basic
neural style transfer technique as indicated
above.
The front end design for such an app has
also been developed. Prototype front end
designs are attached below.
REFERENCES
1. A Neural Algorithm of Artistic Style]
(https://arxiv.org/pdf/1508.06576.pdf) (First
Neural Style Transfer Paper)
2. Perceptual Losses for Real-Time Style
Transfer and Super-Resolution]
(https://arxiv.org/pdf/1603.08155.pdf) (ECCV
2016)
3. Application : Prisma
4. Course.fast.ai
5. https://arxiv.org/abs/1412.0035
6.

Implementing Neural Style Transfer

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Implementing Neural Style Transfer

Similar to Implementing Neural Style Transfer (20)

Recently uploaded

Recently uploaded (20)

Implementing Neural Style Transfer