SlideShare a Scribd company logo
1 of 8
Download to read offline
Transfer Style Convolutional Neural Network
E4040.2016Fall.CJMD.report
Carlos Espino ce2330, Juan Borgnino jb3852, Jose Ramirez jdr2162
Columbia University
Abstract
In the following paper we implement the style transfer
algorithm designed by the authors Gatys et al. which
can generate new synthesised images by combining
the style and the content of two images. We explain
the methodology developed by the authors and
implement the solution using Theano. We find that
results are highly dependant on the values specified
for the hyperparameters and training is quite time
intensive due to the large number of parameters
required. In addition, we successfully generate new
images combining several pictures (including one of
us) with the style of several well known painters.
1. Introduction
Our paper starts with an explanation of the work
and methodology developed by the authors Gatys
et. a.l on section 2. We provide an intuitive
explanation of how the algorithm works, including
loss functions used and provide images showing the
results in section 3. We continue with section 4,
where we describe our implementation of the
algorithm on Theano and the architecture used. Next,
in section 5 we present our final results in the form of
synthesised images combining the content and style
of two different images. Finally, section 6 presents
our conclusions and advice for individuals who wish
to obtain the same results.
2. Summary of the Original Paper
The paper introduces a Deep Learning Network
which can
create artistic images by combining the style of one
image with the content of another.
In order to do so, the authors explain that the
representation of content and style in the CNN are
separable. Hence, one can take both separately to
generate a new image based on content of one
image and style of another. To demonstrate their
findings, the authors match the content of a
photograph from Tubingen, Germany with the style of
several well known artworks from different periods.
See Figure 1- A.
The final results show that the synthesised image
maintains the global arrangement of the original
image (the image from Tubingen), while the colours
and local structures that compose the global scenery
belong to the chosen artwork. However, before
generating the new image one can specify how much
one desires to preserve from the style and how much
from the content.
2.1 Methodology of the Original Paper
Gatys et al. designs a method to synthesize an image
that mix the content and the style from different
image using convolutional neural networks.
Specifically, they use the VGG-Network with 19
layers (VGG-19), which consists in a Convolutional
Neural Network that has a similar performance to
humans in basic image recognition tasks
(​Russakovsky et al. 2015)​. They minimize a loss
function that takes into account :
1. The neural representations to capture the
content of an image.
2. The style representations of another image
using that computes the correlations
between different type of neurons.
The following diagram shows the methodology they
use to generate the desired image. More details
about the representation are given in section 3.1
Figure 1 Image credit Gatys. et. al. (2015)
2.2 Key Results of the Original Paper
The key result of the paper is that the representations
of style and content in the CNN are separable. As a
result, one can take the content of one image and
combine it with the style of another image.
The following image shows the results when
combining
a photograph from Tubingen, Germany with the style
of three well known artists:
Figure 2 Image credit Gatys. et al. (2015)
Different results can be achieved by variating the
values for the parameters and which
correspond to the weight of content and style of two
images, respectively.
In particular they try different values for the
proportion combining them with style
complexity when including more layers of the
network. More details about ​, and style
complexity, are given in section 3.1.
Different values for and for style complexity
yield very different results as we can observe from
the images below which combine the photograph
from Tubingen, Germany with the style
corresponding to the painting ​Composition VII by
Wassily Kandinsky.
Figure 3 Image credit Gatys. et al. (2015)
These results show how the content of one image
can be combined with the stylet of another one, to
successfully generate a new image which can be
quite appealing.
3. Methodology
3.1. Objectives and Technical
Challenges
The objective is to generate an image that contains
the content of one image with the style of another
one. To do so, we start with a white noise image and
solve a minimization problem, discussed in the
following section, to find another image that matches
the desired content and style. Thus, one of the main
technical challenges consists in the large number of
parameters to learn on the minimization problem
which is 3 x width x height of image. For instance, if
we wish to generate a color image of size 500 x 500,
the number of parameters to learn are 750,000. Also,
one other important challenge is the selection of the
hyperparameters which balance the style and the
structure in the output image. The final results are
quite sensible to the weights assigned to the content
and style loss functions. Given the complexity of the
minimization problem, the choice of the minimization
algorithm is important and plays a key role on the
quality of the results.
3.2. Problem Formulation and Design
We follow the same methodology as ​Gatys et al.
(2015) to formulate the minimization problem. We
explain here the minimization problem mentioned in
section 2.1.
Using the VGG-19 Gatys et al. (2015) remove the fully
connected layers, keeping only the 5 pooling layers
and the 16 convolutional layers (see ​section 4.1). The
trained weights of this network are publicly available
in [4].
This network is used to encode an image at each
convolutional layer using its filter response. In this
way, a layer that has filters, will have feature
maps each of size which corresponds to height x
width of the feature map. We can store all the
responses of layer in a matrix ​, so
corresponds to the activation of filter ​th at
position ​ and layer ​.
Let and be the original image and the generated
image, and and the respective feature
representations at layer ​, in order to generate the
content of the original image, we need to minimize:
where the gradient with respect to can be
computed using back propagation.
Having defined how to generate the content of an
image, a way to generate a style representation is
needed. To do this, the correlation between different
filter responses is computed, taking the expectation
over the spatial extent of the input image. This is
given by the gram matrix where is
the inner product between the feature maps ​, in
vector form, in layer
Having defined this, Let and be the original
image and the generated image, and and the
respective style representations at layer ​, in order to
generate the style of the original image, we need to
minimize:
where is the weighting factor of the contribution of
each layer and is the contribution of layer and
it’s defined as:
Here gradient gradient with respect to can be
computed using back propagation as well.
The original paper chooses for the
convolutional layers we decide to use and 0 for the
layers we don’t want to use.
Having defined the loss functions to generate the
content and the style, if we wish to generate an
image with content from image and style from
image ​, we need to minimize the following loss
function
Now that we have the loss function, we need to
choose a minimization algorithms, we compare
Adam, Adadelta and L-BFGS (limited memory BFGS
by ​Liu, D. C., & Nocedal, J. (1989)​.
The limited memory implementation of the BFGS is
important because if we want to consider quasi
Newton algorithms, we need to compute or estimate
the Hessian of the loss function. This can yield to a
huge memory problem given the dimensionality of
the variables. Hence, a limited memory approach is
needed for this kind of minimization problems.
4. Implementation
In the following section, we describe the deep
learning architecture, then we describe the overall
design of our implementation, and details about
challenges and considerations of it. Our project
require a huge number of parameter to be minimized,
therefore we make different experiments with
multiples gradient descent algorithms.
4.1. Deep Learning Network
As mentioned before, our algorithm uses the VGG-19
network, which was created by the Visual Geometry
Group of Oxford university (VGG). The VGG-19
contains five main layers. Each main layer has a set
of convolutional networks connected, the last three
main layers have four convolutions and the first two
have two convolutions.
Figure 4 Architectural block diagram VGG-19 [7].
Replicating our results take at least 12 hours and
requires variation of the gradient descent algorithm
with adaptive learning rates.
Some of the most important hyperparameters in our
model are and ​. They represent the amount of
style and structure in the output image. This
parameters are very sensible, and they should
change depend on the images involved in the transfer
style. We run multiples combinations of ​, first
we fix the alpha in 0.001 and test with three different
values of beta (1e3, 0.1e4, 0.1e5). In the following
table we can observe the variation in our desired
output. The beta of 0.1e3 has few blue, the second
one (0.1e4) starts to include some yellow colors and
the last one include more style than structure. Our
final configuration was a beta of 0.1e5, because it
maintains a better balance between style and
structure.
We use ‘conv1_1’, ‘conv2_1’, ‘conv3_1’, ‘conv4_1’
and ‘conv5_1’ layers for the style and ‘conv4_2’ layer
for the content.
Figure 5 Setting hyperparameters alpha/beta
To compare our results to ones from the paper we
use The style from Starry Night by Van Gogh and the
content from the Tubingen image.
Then we generate some other examples using the
following images:
1. The style from ​Circus by Joan Miro and the
content from an image of NYC.
2. The style from ​Diego Rivera And Frida Kahlo
Dia De Los Muertos painting by Pristine
Cartera Turkus and the content from an
image of the team members in San
Francisco.
The results and images are shown in section 5.
4.2. Software Design
​Our architecture has four main components. It
requires functions to manipulate our two input
images, the neural net architecture, different kind of
gradient algorithms and the components to train our
optimization problem and return our final result. All
our code was written in Theano.
Figure 6 General architecture, components
Images manipulation​: It is the component used to
load the two input images. It has the responsibility of
crop the original images and rescale them in the
desired resolution.
VGG Model​: It is the most important component. It
contains the neural net architecture describes in the
Section 4.1. Also It has the evaluation function.
Gradient Algorithms​: This component contains
multiples algorithms which are used to optimize our
loss function. It is a critical component, because our
running time is large, for instance with an image of
600 pixels, it takes 12 hours with the scipy l_bfgs_b
minimizer.
The Adam and Adadelta algorithms start converging
fast but they get stuck in a certain point where they
barely decrease the value of the objective function at
each iteration. This prevents the full expression of the
style. In contrast, L-BFGS finds better local minima
because it approximates the Hessian matrix of the
objective function
Training and results​: This component contains the
functions to instantiate our VGG-19 and the gradient.
Also it makes multiple iterations and gives us the
result after minimizing the lost function.
Figure 7 Left Class diagram, Right Call graph
In order to follow the principles of OOP (Object
Oriented Programming), we encapsulate the
functionality of VGG-19, in a class. It contains a
public ordered dictionary with all the Convolutional
and pool layers. Also it calculate the cost function,
which is the sum of alpha times the style loss and
beta times the content lost (​See section 3.2).
In addition, in the Figure 7, we have the call graph.
Our main method call the test_method. It first prepare
the images, and create a new instance of VGG-19,
the input is the art and content images. Then our test
creates a dictionary and save the images in each
convolutional layer of our first instance of VGG-19.
The next step is instantiate a second object VGG-19,
this time our input is a random image. Finally our test
uses a Train function that requires one optimizer, in
this example we use adadelta which is slower than
L-BFGS minimizer.
The pseudocode of our algorithm is the following
Algorithm:​ Generate_Image
Inputs: p​ content image
a:​ style image
​: weight of the content image
​: weight of the style image
Output: x the generated image with content from image
p and style from image a.
vgg = VGG19( ) # create a VGG19 network with
pretrained weights
p_layers = [ 'conv1_1', 'conv2_1', 'conv3_1', 'conv4_1',
'conv5_1']
a_layer = 'conv4_2'
p_vgg = vgg(p) #evaluate the convolution layers on p
a_vgg = vgg(a) #evaluate the convolution layers on a
p_feat = [p_vgg.output(layer) for layer in p_layers]
a_feat = a_vgg.output(a_layer)
x = random_image
x_feat = [vgg(x).output(layer) for layer in p_layers and a
_layer]
# minimize the loss function using the
desired minimization algorithm
return x
The content_loss and style_loss functions are defined
in ​Section 3.1.
5. Results
5.1. Project Results
First we replicated the main example from the paper.
Remember that is takes the style from ​The Starry
Night by Van Gogh.
Figure 8 Setting hyperparameters alpha/beta
For this experiment we chose a value ​and of
0.01 and 1,000 respectively. The result shows the
desired output and it’s similar to the one on the
paper, If we wish to get more accurate results we
would need to give more iterations to the
optimization algorithm. At some point the Adam
optimization algorithm gets stuck and decreases the
value of the objective function in small amounts.
We also wanted to try other combination of images to
see what the algorithm is capable of. Our next
attempt was combining the style of ​The Circus by
Miro and an image of NY. The results are the
following
Figure 9 Result NYC-Circus by Miro
Finally, we tried with an image of the three of us
(Carlos, on the left, Jose on the Middle and Juan on
the right) combined with a painting from ​Diego Rivera
And Frida Kahlo Dia De Los Muertos painting by
Pristine Cartera Turkus . This case is interesting
because the painting contains a blue and white face.
However, we see that only the white face seems to
be translated to the synthesised image because our
faces look white. Probably by making our faces white
the loss function was lower than if changed to blue.
This is due to the fact that the content loss is
probably smaller if our faces are white because this
color is closer to our skin color, when compared to
the blue faces.
Figure 10 San Francisco and Dia de muertos
5.2. Comparison of Results
Our first result tries to reproduce the original image
from Gatys et. al.
Figure 11 Comparison with Gatys et. al.
Our image extracts successfully the style from the
painting and applies it to the photo. We can see that
we are capturing some more small details from the
brush but doesn’t capture the stars in the sky. As we
have commented before, this may be caused by
different factors such as the choices of and and
the optimization algorithm.
We wanted also our to compare result with the
Deepart.io commerical tool. Deepart.io uses the
original algorithm by Gatys et. al. (2015)
Figure 12 Deepart.io vs our result comaprison
It can be appreciated that both images look similar in
the cartoon-like colors and arrangements, however
Deepart.io’s generates more polished styling, we
believe this is happening because we used adam to
generate that image and it got stuck before
generating more details. Also, as we mentioned
before, the results are very sensitive to the choice of
and ​.
5.3. Discussion of Insights Gained
We learned some important lessons while working
on this project, especially with the importance of the
optimization problem. It’s important to note that the
number of parameters grow quadratically with the
desired size of the image. If we wish to get a very
high resolution image, we will have to minimize a loss
function on millions of parameters. Hence, it’s
important to choose an algorithm that converges fast
and finds a good local minimum. This issue makes us
think about using quasi newton methods, but we also
have to be careful with the size of the Hessian that
can affect significantly the performance of our
algorithm, that’s why limited memory algorithms like
L-BFGS are the right choice.
We also noticed that the choice of and can
change significantly the result of the algorithm, and
it’s fun to play with them with different images to
create unique pieces of art. However, each image
takes some hours to generate, making it difficult to
try many different values.
6. Conclusion
We successfully implemented the style
transfer algorithm developed by Gatys et al. which
can combine the content and style of two images.
The main finding of the research of the authors is that
the representations of style and content in the CNN
are separable. Therefore, one can take the content of
one image and combine it with the style of another
image.
To implement the algorithm we used the
convolutional and pooling layers of the VGG-19
network. We then formulated and solved the
minimization problem present on the paper using
Theano. Finally, we replicated the results of the paper
and experimented with new images and artists. We
managed to create appealing images with an artistic
style.
For individuals interested in replicating the
same results, we recommend to first work with small
images (32x32) to assess fast whether the algorithm
is working or not. Given the large number of weights
which need to be updated, the algorithm takes a long
time to run. We also advise to try different values for
and to find the desired balance between style
and content. Finally, we recommend using the
L-BFGS algorithm, a quasi newton method, for
solving the minimization problem. We tried Adam and
regular steepest descent algorithms, but obtained
much faster and better convergence with the L-BFGS
algorithm.
7. Acknowledgement
We would like to acknowledge the work done in [6], it
was a very useful guide that helped us to figure out
some implementation details.
8. References
Include all references - papers, code, links, books.
[1] Bitbucket repo:
https://bitbucket.org/e_4040_ta/e4040_project_cjmd
[2] ​Gatys, L. A., Ecker, A. S., & Bethge, M. (2015). A
neural algorithm of artistic style. ​arXiv preprint
arXiv:1508.06576.
[3] ​Russakovsky, O., Deng, J., Su, H., Krause, J.,
Satheesh, S., Ma, S., ... & Berg, A. C. (2015).
Imagenet large scale visual recognition challenge.
International Journal of Computer Vision, ​115(3),
211-252.
[4]​https://s3.amazonaws.com/lasagne/recipes/pretrai
ned/imagenet/vgg19_normalized.pkl
[5] ​Gatys, L. A., Ecker, A. S., & Bethge, M. (2016).
Image style transfer using convolutional neural
networks. In ​Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (pp.
2414-2423).
[6]https://github.com/Lasagne/Recipes/blob/master/
examples/styletransfer/Art%20Style%20Transfer.ipy
nb
[7]​http://www.slideshare.net/ckmarkohchang/applied
-deep-learning-1103-convolutional-neural-networks
[8] ​Liu, D. C., & Nocedal, J. (1989). On the limited
memory BFGS method for large scale optimization.
Mathematical programming, ​45(1-3), 503-528.
9. Appendix
9.1 Individual student contributions
ce2330 jb3852 jdr2162
Last Name Espino Borgnino Ramirez
Fraction of
(useful) total
contribution
1/3 1/3 1/3
What I did 1 Implement
the
structure
of the
network.
Implement
the loss
functions
and the
theano
training
model
Implement the
classes and
fixed issues on
the code.
What I did 2 Methodolo
gy and
formulatio
n
Introduction,
results and
conclusions
Implementation
s and Software
design
What I did 3 Run third
example
Run first
example
Run second
example

More Related Content

What's hot

IRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET Journal
 
Secret-Fragment-Visible Mosaic Image-Creation and Recovery via Colour Transfo...
Secret-Fragment-Visible Mosaic Image-Creation and Recovery via Colour Transfo...Secret-Fragment-Visible Mosaic Image-Creation and Recovery via Colour Transfo...
Secret-Fragment-Visible Mosaic Image-Creation and Recovery via Colour Transfo...IJSRD
 
Development of stereo matching algorithm based on sum of absolute RGB color d...
Development of stereo matching algorithm based on sum of absolute RGB color d...Development of stereo matching algorithm based on sum of absolute RGB color d...
Development of stereo matching algorithm based on sum of absolute RGB color d...IJECEIAES
 
IRJET- An Approach to FPGA based Implementation of Image Mosaicing using Neur...
IRJET- An Approach to FPGA based Implementation of Image Mosaicing using Neur...IRJET- An Approach to FPGA based Implementation of Image Mosaicing using Neur...
IRJET- An Approach to FPGA based Implementation of Image Mosaicing using Neur...IRJET Journal
 
Medial Axis Transformation based Skeletonzation of Image Patterns using Image...
Medial Axis Transformation based Skeletonzation of Image Patterns using Image...Medial Axis Transformation based Skeletonzation of Image Patterns using Image...
Medial Axis Transformation based Skeletonzation of Image Patterns using Image...IOSR Journals
 
Traffic sign classification
Traffic sign classificationTraffic sign classification
Traffic sign classificationBill Kromydas
 
A proposed accelerated image copy-move forgery detection-vcip2014
A proposed accelerated image copy-move forgery detection-vcip2014A proposed accelerated image copy-move forgery detection-vcip2014
A proposed accelerated image copy-move forgery detection-vcip2014SondosFadl
 
Segmentation by Fusion of Self-Adaptive SFCM Cluster in Multi-Color Space Com...
Segmentation by Fusion of Self-Adaptive SFCM Cluster in Multi-Color Space Com...Segmentation by Fusion of Self-Adaptive SFCM Cluster in Multi-Color Space Com...
Segmentation by Fusion of Self-Adaptive SFCM Cluster in Multi-Color Space Com...CSCJournals
 
FAN search for image copy-move forgery-amalta 2014
 FAN search for image copy-move forgery-amalta 2014 FAN search for image copy-move forgery-amalta 2014
FAN search for image copy-move forgery-amalta 2014SondosFadl
 
BAYESIAN CLASSIFICATION OF FABRICS USING BINARY CO-OCCURRENCE MATRIX
BAYESIAN CLASSIFICATION OF FABRICS USING BINARY CO-OCCURRENCE MATRIXBAYESIAN CLASSIFICATION OF FABRICS USING BINARY CO-OCCURRENCE MATRIX
BAYESIAN CLASSIFICATION OF FABRICS USING BINARY CO-OCCURRENCE MATRIXijistjournal
 
Multiexposure Image Fusion
Multiexposure Image FusionMultiexposure Image Fusion
Multiexposure Image FusionIJMER
 
6. 7772 8117-1-pb
6. 7772 8117-1-pb6. 7772 8117-1-pb
6. 7772 8117-1-pbIAESIJEECS
 
ANOVA and Fisher Criterion based Feature Selection for Lower Dimensional Univ...
ANOVA and Fisher Criterion based Feature Selection for Lower Dimensional Univ...ANOVA and Fisher Criterion based Feature Selection for Lower Dimensional Univ...
ANOVA and Fisher Criterion based Feature Selection for Lower Dimensional Univ...CSCJournals
 
Fractal Image Compression of Satellite Color Imageries Using Variable Size of...
Fractal Image Compression of Satellite Color Imageries Using Variable Size of...Fractal Image Compression of Satellite Color Imageries Using Variable Size of...
Fractal Image Compression of Satellite Color Imageries Using Variable Size of...CSCJournals
 
Probabilistic model based image segmentation
Probabilistic model based image segmentationProbabilistic model based image segmentation
Probabilistic model based image segmentationijma
 
A CONCERT EVALUATION OF EXEMPLAR BASED IMAGE INPAINTING ALGORITHMS FOR NATURA...
A CONCERT EVALUATION OF EXEMPLAR BASED IMAGE INPAINTING ALGORITHMS FOR NATURA...A CONCERT EVALUATION OF EXEMPLAR BASED IMAGE INPAINTING ALGORITHMS FOR NATURA...
A CONCERT EVALUATION OF EXEMPLAR BASED IMAGE INPAINTING ALGORITHMS FOR NATURA...cscpconf
 
Human Head Counting and Detection using Convnets
Human Head Counting and Detection using ConvnetsHuman Head Counting and Detection using Convnets
Human Head Counting and Detection using Convnetsrahulmonikasharma
 

What's hot (20)

IRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature Descriptor
 
Secret-Fragment-Visible Mosaic Image-Creation and Recovery via Colour Transfo...
Secret-Fragment-Visible Mosaic Image-Creation and Recovery via Colour Transfo...Secret-Fragment-Visible Mosaic Image-Creation and Recovery via Colour Transfo...
Secret-Fragment-Visible Mosaic Image-Creation and Recovery via Colour Transfo...
 
Development of stereo matching algorithm based on sum of absolute RGB color d...
Development of stereo matching algorithm based on sum of absolute RGB color d...Development of stereo matching algorithm based on sum of absolute RGB color d...
Development of stereo matching algorithm based on sum of absolute RGB color d...
 
IRJET- An Approach to FPGA based Implementation of Image Mosaicing using Neur...
IRJET- An Approach to FPGA based Implementation of Image Mosaicing using Neur...IRJET- An Approach to FPGA based Implementation of Image Mosaicing using Neur...
IRJET- An Approach to FPGA based Implementation of Image Mosaicing using Neur...
 
Medial Axis Transformation based Skeletonzation of Image Patterns using Image...
Medial Axis Transformation based Skeletonzation of Image Patterns using Image...Medial Axis Transformation based Skeletonzation of Image Patterns using Image...
Medial Axis Transformation based Skeletonzation of Image Patterns using Image...
 
Medial axis transformation based skeletonzation of image patterns using image...
Medial axis transformation based skeletonzation of image patterns using image...Medial axis transformation based skeletonzation of image patterns using image...
Medial axis transformation based skeletonzation of image patterns using image...
 
Traffic sign classification
Traffic sign classificationTraffic sign classification
Traffic sign classification
 
A proposed accelerated image copy-move forgery detection-vcip2014
A proposed accelerated image copy-move forgery detection-vcip2014A proposed accelerated image copy-move forgery detection-vcip2014
A proposed accelerated image copy-move forgery detection-vcip2014
 
Segmentation by Fusion of Self-Adaptive SFCM Cluster in Multi-Color Space Com...
Segmentation by Fusion of Self-Adaptive SFCM Cluster in Multi-Color Space Com...Segmentation by Fusion of Self-Adaptive SFCM Cluster in Multi-Color Space Com...
Segmentation by Fusion of Self-Adaptive SFCM Cluster in Multi-Color Space Com...
 
ijecct
ijecctijecct
ijecct
 
FAN search for image copy-move forgery-amalta 2014
 FAN search for image copy-move forgery-amalta 2014 FAN search for image copy-move forgery-amalta 2014
FAN search for image copy-move forgery-amalta 2014
 
BAYESIAN CLASSIFICATION OF FABRICS USING BINARY CO-OCCURRENCE MATRIX
BAYESIAN CLASSIFICATION OF FABRICS USING BINARY CO-OCCURRENCE MATRIXBAYESIAN CLASSIFICATION OF FABRICS USING BINARY CO-OCCURRENCE MATRIX
BAYESIAN CLASSIFICATION OF FABRICS USING BINARY CO-OCCURRENCE MATRIX
 
Multiexposure Image Fusion
Multiexposure Image FusionMultiexposure Image Fusion
Multiexposure Image Fusion
 
6. 7772 8117-1-pb
6. 7772 8117-1-pb6. 7772 8117-1-pb
6. 7772 8117-1-pb
 
ANOVA and Fisher Criterion based Feature Selection for Lower Dimensional Univ...
ANOVA and Fisher Criterion based Feature Selection for Lower Dimensional Univ...ANOVA and Fisher Criterion based Feature Selection for Lower Dimensional Univ...
ANOVA and Fisher Criterion based Feature Selection for Lower Dimensional Univ...
 
FULL PAPER.PDF
FULL PAPER.PDFFULL PAPER.PDF
FULL PAPER.PDF
 
Fractal Image Compression of Satellite Color Imageries Using Variable Size of...
Fractal Image Compression of Satellite Color Imageries Using Variable Size of...Fractal Image Compression of Satellite Color Imageries Using Variable Size of...
Fractal Image Compression of Satellite Color Imageries Using Variable Size of...
 
Probabilistic model based image segmentation
Probabilistic model based image segmentationProbabilistic model based image segmentation
Probabilistic model based image segmentation
 
A CONCERT EVALUATION OF EXEMPLAR BASED IMAGE INPAINTING ALGORITHMS FOR NATURA...
A CONCERT EVALUATION OF EXEMPLAR BASED IMAGE INPAINTING ALGORITHMS FOR NATURA...A CONCERT EVALUATION OF EXEMPLAR BASED IMAGE INPAINTING ALGORITHMS FOR NATURA...
A CONCERT EVALUATION OF EXEMPLAR BASED IMAGE INPAINTING ALGORITHMS FOR NATURA...
 
Human Head Counting and Detection using Convnets
Human Head Counting and Detection using ConvnetsHuman Head Counting and Detection using Convnets
Human Head Counting and Detection using Convnets
 

Similar to E4040.2016 fall.cjmd.report.ce2330.jb3852.jdr2162

Implementing Neural Style Transfer
Implementing Neural Style Transfer Implementing Neural Style Transfer
Implementing Neural Style Transfer Tahsin Mayeesha
 
IRJET- Concepts, Methods and Applications of Neural Style Transfer: A Rev...
IRJET-  	  Concepts, Methods and Applications of Neural Style Transfer: A Rev...IRJET-  	  Concepts, Methods and Applications of Neural Style Transfer: A Rev...
IRJET- Concepts, Methods and Applications of Neural Style Transfer: A Rev...IRJET Journal
 
ee8220_project_W2013_v5
ee8220_project_W2013_v5ee8220_project_W2013_v5
ee8220_project_W2013_v5Farhad Gholami
 
Comparative Study and Analysis of Image Inpainting Techniques
Comparative Study and Analysis of Image Inpainting TechniquesComparative Study and Analysis of Image Inpainting Techniques
Comparative Study and Analysis of Image Inpainting TechniquesIOSR Journals
 
Image Enhancement Using Filter To Adjust Dynamic Range of Pixels
Image Enhancement Using Filter To Adjust Dynamic Range of PixelsImage Enhancement Using Filter To Adjust Dynamic Range of Pixels
Image Enhancement Using Filter To Adjust Dynamic Range of PixelsIJERA Editor
 
Neural Style Transfer in Practice
Neural Style Transfer in PracticeNeural Style Transfer in Practice
Neural Style Transfer in PracticeKhalilBergaoui
 
A03501001006
A03501001006A03501001006
A03501001006theijes
 
V.KARTHIKEYAN PUBLISHED ARTICLE
V.KARTHIKEYAN PUBLISHED ARTICLEV.KARTHIKEYAN PUBLISHED ARTICLE
V.KARTHIKEYAN PUBLISHED ARTICLEKARTHIKEYAN V
 
Lecture 15 image morphology examples
Lecture 15 image morphology examplesLecture 15 image morphology examples
Lecture 15 image morphology examplesMarwa Ahmeid
 
The Effectiveness and Efficiency of Medical Images after Special Filtration f...
The Effectiveness and Efficiency of Medical Images after Special Filtration f...The Effectiveness and Efficiency of Medical Images after Special Filtration f...
The Effectiveness and Efficiency of Medical Images after Special Filtration f...Editor IJCATR
 
Kentaro_region_filling_inpainting
Kentaro_region_filling_inpaintingKentaro_region_filling_inpainting
Kentaro_region_filling_inpaintingVipin Gupta
 
Finite_Element_Analysis_with_MATLAB_GUI
Finite_Element_Analysis_with_MATLAB_GUIFinite_Element_Analysis_with_MATLAB_GUI
Finite_Element_Analysis_with_MATLAB_GUIColby White
 
Extraction of Buildings from Satellite Images
Extraction of Buildings from Satellite ImagesExtraction of Buildings from Satellite Images
Extraction of Buildings from Satellite ImagesAkanksha Prasad
 
Paper id 27201451
Paper id 27201451Paper id 27201451
Paper id 27201451IJRAT
 
Use of Wavelet Transform Extension for Graphics Image Compression using JPEG2...
Use of Wavelet Transform Extension for Graphics Image Compression using JPEG2...Use of Wavelet Transform Extension for Graphics Image Compression using JPEG2...
Use of Wavelet Transform Extension for Graphics Image Compression using JPEG2...CSCJournals
 
Sinusoidal Function for Population Size in Quantum Evolutionary Algorithm and...
Sinusoidal Function for Population Size in Quantum Evolutionary Algorithm and...Sinusoidal Function for Population Size in Quantum Evolutionary Algorithm and...
Sinusoidal Function for Population Size in Quantum Evolutionary Algorithm and...sipij
 

Similar to E4040.2016 fall.cjmd.report.ce2330.jb3852.jdr2162 (20)

Implementing Neural Style Transfer
Implementing Neural Style Transfer Implementing Neural Style Transfer
Implementing Neural Style Transfer
 
IRJET- Concepts, Methods and Applications of Neural Style Transfer: A Rev...
IRJET-  	  Concepts, Methods and Applications of Neural Style Transfer: A Rev...IRJET-  	  Concepts, Methods and Applications of Neural Style Transfer: A Rev...
IRJET- Concepts, Methods and Applications of Neural Style Transfer: A Rev...
 
ee8220_project_W2013_v5
ee8220_project_W2013_v5ee8220_project_W2013_v5
ee8220_project_W2013_v5
 
Comparative Study and Analysis of Image Inpainting Techniques
Comparative Study and Analysis of Image Inpainting TechniquesComparative Study and Analysis of Image Inpainting Techniques
Comparative Study and Analysis of Image Inpainting Techniques
 
Image Enhancement Using Filter To Adjust Dynamic Range of Pixels
Image Enhancement Using Filter To Adjust Dynamic Range of PixelsImage Enhancement Using Filter To Adjust Dynamic Range of Pixels
Image Enhancement Using Filter To Adjust Dynamic Range of Pixels
 
Neural Style Transfer in Practice
Neural Style Transfer in PracticeNeural Style Transfer in Practice
Neural Style Transfer in Practice
 
Neural Style Transfer in practice
Neural Style Transfer in practiceNeural Style Transfer in practice
Neural Style Transfer in practice
 
A03501001006
A03501001006A03501001006
A03501001006
 
V.KARTHIKEYAN PUBLISHED ARTICLE
V.KARTHIKEYAN PUBLISHED ARTICLEV.KARTHIKEYAN PUBLISHED ARTICLE
V.KARTHIKEYAN PUBLISHED ARTICLE
 
Lecture 15 image morphology examples
Lecture 15 image morphology examplesLecture 15 image morphology examples
Lecture 15 image morphology examples
 
N42018588
N42018588N42018588
N42018588
 
The Effectiveness and Efficiency of Medical Images after Special Filtration f...
The Effectiveness and Efficiency of Medical Images after Special Filtration f...The Effectiveness and Efficiency of Medical Images after Special Filtration f...
The Effectiveness and Efficiency of Medical Images after Special Filtration f...
 
Kentaro_region_filling_inpainting
Kentaro_region_filling_inpaintingKentaro_region_filling_inpainting
Kentaro_region_filling_inpainting
 
Finite_Element_Analysis_with_MATLAB_GUI
Finite_Element_Analysis_with_MATLAB_GUIFinite_Element_Analysis_with_MATLAB_GUI
Finite_Element_Analysis_with_MATLAB_GUI
 
Extraction of Buildings from Satellite Images
Extraction of Buildings from Satellite ImagesExtraction of Buildings from Satellite Images
Extraction of Buildings from Satellite Images
 
Paper id 27201451
Paper id 27201451Paper id 27201451
Paper id 27201451
 
Use of Wavelet Transform Extension for Graphics Image Compression using JPEG2...
Use of Wavelet Transform Extension for Graphics Image Compression using JPEG2...Use of Wavelet Transform Extension for Graphics Image Compression using JPEG2...
Use of Wavelet Transform Extension for Graphics Image Compression using JPEG2...
 
robio-2014-falquez
robio-2014-falquezrobio-2014-falquez
robio-2014-falquez
 
Sinusoidal Function for Population Size in Quantum Evolutionary Algorithm and...
Sinusoidal Function for Population Size in Quantum Evolutionary Algorithm and...Sinusoidal Function for Population Size in Quantum Evolutionary Algorithm and...
Sinusoidal Function for Population Size in Quantum Evolutionary Algorithm and...
 
Log polar coordinates
Log polar coordinatesLog polar coordinates
Log polar coordinates
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

E4040.2016 fall.cjmd.report.ce2330.jb3852.jdr2162

  • 1. Transfer Style Convolutional Neural Network E4040.2016Fall.CJMD.report Carlos Espino ce2330, Juan Borgnino jb3852, Jose Ramirez jdr2162 Columbia University Abstract In the following paper we implement the style transfer algorithm designed by the authors Gatys et al. which can generate new synthesised images by combining the style and the content of two images. We explain the methodology developed by the authors and implement the solution using Theano. We find that results are highly dependant on the values specified for the hyperparameters and training is quite time intensive due to the large number of parameters required. In addition, we successfully generate new images combining several pictures (including one of us) with the style of several well known painters. 1. Introduction Our paper starts with an explanation of the work and methodology developed by the authors Gatys et. a.l on section 2. We provide an intuitive explanation of how the algorithm works, including loss functions used and provide images showing the results in section 3. We continue with section 4, where we describe our implementation of the algorithm on Theano and the architecture used. Next, in section 5 we present our final results in the form of synthesised images combining the content and style of two different images. Finally, section 6 presents our conclusions and advice for individuals who wish to obtain the same results. 2. Summary of the Original Paper The paper introduces a Deep Learning Network which can create artistic images by combining the style of one image with the content of another. In order to do so, the authors explain that the representation of content and style in the CNN are separable. Hence, one can take both separately to generate a new image based on content of one image and style of another. To demonstrate their findings, the authors match the content of a photograph from Tubingen, Germany with the style of several well known artworks from different periods. See Figure 1- A. The final results show that the synthesised image maintains the global arrangement of the original image (the image from Tubingen), while the colours and local structures that compose the global scenery belong to the chosen artwork. However, before generating the new image one can specify how much one desires to preserve from the style and how much from the content. 2.1 Methodology of the Original Paper Gatys et al. designs a method to synthesize an image that mix the content and the style from different image using convolutional neural networks. Specifically, they use the VGG-Network with 19 layers (VGG-19), which consists in a Convolutional Neural Network that has a similar performance to humans in basic image recognition tasks (​Russakovsky et al. 2015)​. They minimize a loss function that takes into account : 1. The neural representations to capture the content of an image. 2. The style representations of another image using that computes the correlations between different type of neurons. The following diagram shows the methodology they use to generate the desired image. More details about the representation are given in section 3.1
  • 2. Figure 1 Image credit Gatys. et. al. (2015) 2.2 Key Results of the Original Paper The key result of the paper is that the representations of style and content in the CNN are separable. As a result, one can take the content of one image and combine it with the style of another image. The following image shows the results when combining a photograph from Tubingen, Germany with the style of three well known artists: Figure 2 Image credit Gatys. et al. (2015) Different results can be achieved by variating the values for the parameters and which correspond to the weight of content and style of two images, respectively. In particular they try different values for the proportion combining them with style complexity when including more layers of the network. More details about ​, and style complexity, are given in section 3.1. Different values for and for style complexity yield very different results as we can observe from the images below which combine the photograph from Tubingen, Germany with the style corresponding to the painting ​Composition VII by Wassily Kandinsky. Figure 3 Image credit Gatys. et al. (2015) These results show how the content of one image can be combined with the stylet of another one, to successfully generate a new image which can be quite appealing. 3. Methodology 3.1. Objectives and Technical Challenges
  • 3. The objective is to generate an image that contains the content of one image with the style of another one. To do so, we start with a white noise image and solve a minimization problem, discussed in the following section, to find another image that matches the desired content and style. Thus, one of the main technical challenges consists in the large number of parameters to learn on the minimization problem which is 3 x width x height of image. For instance, if we wish to generate a color image of size 500 x 500, the number of parameters to learn are 750,000. Also, one other important challenge is the selection of the hyperparameters which balance the style and the structure in the output image. The final results are quite sensible to the weights assigned to the content and style loss functions. Given the complexity of the minimization problem, the choice of the minimization algorithm is important and plays a key role on the quality of the results. 3.2. Problem Formulation and Design We follow the same methodology as ​Gatys et al. (2015) to formulate the minimization problem. We explain here the minimization problem mentioned in section 2.1. Using the VGG-19 Gatys et al. (2015) remove the fully connected layers, keeping only the 5 pooling layers and the 16 convolutional layers (see ​section 4.1). The trained weights of this network are publicly available in [4]. This network is used to encode an image at each convolutional layer using its filter response. In this way, a layer that has filters, will have feature maps each of size which corresponds to height x width of the feature map. We can store all the responses of layer in a matrix ​, so corresponds to the activation of filter ​th at position ​ and layer ​. Let and be the original image and the generated image, and and the respective feature representations at layer ​, in order to generate the content of the original image, we need to minimize: where the gradient with respect to can be computed using back propagation. Having defined how to generate the content of an image, a way to generate a style representation is needed. To do this, the correlation between different filter responses is computed, taking the expectation over the spatial extent of the input image. This is given by the gram matrix where is the inner product between the feature maps ​, in vector form, in layer Having defined this, Let and be the original image and the generated image, and and the respective style representations at layer ​, in order to generate the style of the original image, we need to minimize: where is the weighting factor of the contribution of each layer and is the contribution of layer and it’s defined as: Here gradient gradient with respect to can be computed using back propagation as well. The original paper chooses for the convolutional layers we decide to use and 0 for the layers we don’t want to use. Having defined the loss functions to generate the content and the style, if we wish to generate an
  • 4. image with content from image and style from image ​, we need to minimize the following loss function Now that we have the loss function, we need to choose a minimization algorithms, we compare Adam, Adadelta and L-BFGS (limited memory BFGS by ​Liu, D. C., & Nocedal, J. (1989)​. The limited memory implementation of the BFGS is important because if we want to consider quasi Newton algorithms, we need to compute or estimate the Hessian of the loss function. This can yield to a huge memory problem given the dimensionality of the variables. Hence, a limited memory approach is needed for this kind of minimization problems. 4. Implementation In the following section, we describe the deep learning architecture, then we describe the overall design of our implementation, and details about challenges and considerations of it. Our project require a huge number of parameter to be minimized, therefore we make different experiments with multiples gradient descent algorithms. 4.1. Deep Learning Network As mentioned before, our algorithm uses the VGG-19 network, which was created by the Visual Geometry Group of Oxford university (VGG). The VGG-19 contains five main layers. Each main layer has a set of convolutional networks connected, the last three main layers have four convolutions and the first two have two convolutions. Figure 4 Architectural block diagram VGG-19 [7]. Replicating our results take at least 12 hours and requires variation of the gradient descent algorithm with adaptive learning rates. Some of the most important hyperparameters in our model are and ​. They represent the amount of style and structure in the output image. This parameters are very sensible, and they should change depend on the images involved in the transfer style. We run multiples combinations of ​, first we fix the alpha in 0.001 and test with three different values of beta (1e3, 0.1e4, 0.1e5). In the following table we can observe the variation in our desired output. The beta of 0.1e3 has few blue, the second one (0.1e4) starts to include some yellow colors and the last one include more style than structure. Our final configuration was a beta of 0.1e5, because it maintains a better balance between style and structure. We use ‘conv1_1’, ‘conv2_1’, ‘conv3_1’, ‘conv4_1’ and ‘conv5_1’ layers for the style and ‘conv4_2’ layer for the content. Figure 5 Setting hyperparameters alpha/beta
  • 5. To compare our results to ones from the paper we use The style from Starry Night by Van Gogh and the content from the Tubingen image. Then we generate some other examples using the following images: 1. The style from ​Circus by Joan Miro and the content from an image of NYC. 2. The style from ​Diego Rivera And Frida Kahlo Dia De Los Muertos painting by Pristine Cartera Turkus and the content from an image of the team members in San Francisco. The results and images are shown in section 5. 4.2. Software Design ​Our architecture has four main components. It requires functions to manipulate our two input images, the neural net architecture, different kind of gradient algorithms and the components to train our optimization problem and return our final result. All our code was written in Theano. Figure 6 General architecture, components Images manipulation​: It is the component used to load the two input images. It has the responsibility of crop the original images and rescale them in the desired resolution. VGG Model​: It is the most important component. It contains the neural net architecture describes in the Section 4.1. Also It has the evaluation function. Gradient Algorithms​: This component contains multiples algorithms which are used to optimize our loss function. It is a critical component, because our running time is large, for instance with an image of 600 pixels, it takes 12 hours with the scipy l_bfgs_b minimizer. The Adam and Adadelta algorithms start converging fast but they get stuck in a certain point where they barely decrease the value of the objective function at each iteration. This prevents the full expression of the style. In contrast, L-BFGS finds better local minima because it approximates the Hessian matrix of the objective function Training and results​: This component contains the functions to instantiate our VGG-19 and the gradient. Also it makes multiple iterations and gives us the result after minimizing the lost function. Figure 7 Left Class diagram, Right Call graph In order to follow the principles of OOP (Object Oriented Programming), we encapsulate the functionality of VGG-19, in a class. It contains a public ordered dictionary with all the Convolutional and pool layers. Also it calculate the cost function, which is the sum of alpha times the style loss and beta times the content lost (​See section 3.2). In addition, in the Figure 7, we have the call graph. Our main method call the test_method. It first prepare the images, and create a new instance of VGG-19, the input is the art and content images. Then our test creates a dictionary and save the images in each convolutional layer of our first instance of VGG-19. The next step is instantiate a second object VGG-19, this time our input is a random image. Finally our test uses a Train function that requires one optimizer, in this example we use adadelta which is slower than L-BFGS minimizer.
  • 6. The pseudocode of our algorithm is the following Algorithm:​ Generate_Image Inputs: p​ content image a:​ style image ​: weight of the content image ​: weight of the style image Output: x the generated image with content from image p and style from image a. vgg = VGG19( ) # create a VGG19 network with pretrained weights p_layers = [ 'conv1_1', 'conv2_1', 'conv3_1', 'conv4_1', 'conv5_1'] a_layer = 'conv4_2' p_vgg = vgg(p) #evaluate the convolution layers on p a_vgg = vgg(a) #evaluate the convolution layers on a p_feat = [p_vgg.output(layer) for layer in p_layers] a_feat = a_vgg.output(a_layer) x = random_image x_feat = [vgg(x).output(layer) for layer in p_layers and a _layer] # minimize the loss function using the desired minimization algorithm return x The content_loss and style_loss functions are defined in ​Section 3.1. 5. Results 5.1. Project Results First we replicated the main example from the paper. Remember that is takes the style from ​The Starry Night by Van Gogh. Figure 8 Setting hyperparameters alpha/beta For this experiment we chose a value ​and of 0.01 and 1,000 respectively. The result shows the desired output and it’s similar to the one on the paper, If we wish to get more accurate results we would need to give more iterations to the optimization algorithm. At some point the Adam optimization algorithm gets stuck and decreases the value of the objective function in small amounts. We also wanted to try other combination of images to see what the algorithm is capable of. Our next attempt was combining the style of ​The Circus by Miro and an image of NY. The results are the following Figure 9 Result NYC-Circus by Miro Finally, we tried with an image of the three of us (Carlos, on the left, Jose on the Middle and Juan on
  • 7. the right) combined with a painting from ​Diego Rivera And Frida Kahlo Dia De Los Muertos painting by Pristine Cartera Turkus . This case is interesting because the painting contains a blue and white face. However, we see that only the white face seems to be translated to the synthesised image because our faces look white. Probably by making our faces white the loss function was lower than if changed to blue. This is due to the fact that the content loss is probably smaller if our faces are white because this color is closer to our skin color, when compared to the blue faces. Figure 10 San Francisco and Dia de muertos 5.2. Comparison of Results Our first result tries to reproduce the original image from Gatys et. al. Figure 11 Comparison with Gatys et. al. Our image extracts successfully the style from the painting and applies it to the photo. We can see that we are capturing some more small details from the brush but doesn’t capture the stars in the sky. As we have commented before, this may be caused by different factors such as the choices of and and the optimization algorithm. We wanted also our to compare result with the Deepart.io commerical tool. Deepart.io uses the original algorithm by Gatys et. al. (2015) Figure 12 Deepart.io vs our result comaprison It can be appreciated that both images look similar in the cartoon-like colors and arrangements, however Deepart.io’s generates more polished styling, we believe this is happening because we used adam to generate that image and it got stuck before generating more details. Also, as we mentioned before, the results are very sensitive to the choice of and ​. 5.3. Discussion of Insights Gained We learned some important lessons while working on this project, especially with the importance of the optimization problem. It’s important to note that the number of parameters grow quadratically with the desired size of the image. If we wish to get a very high resolution image, we will have to minimize a loss function on millions of parameters. Hence, it’s important to choose an algorithm that converges fast and finds a good local minimum. This issue makes us think about using quasi newton methods, but we also have to be careful with the size of the Hessian that can affect significantly the performance of our algorithm, that’s why limited memory algorithms like L-BFGS are the right choice. We also noticed that the choice of and can change significantly the result of the algorithm, and it’s fun to play with them with different images to
  • 8. create unique pieces of art. However, each image takes some hours to generate, making it difficult to try many different values. 6. Conclusion We successfully implemented the style transfer algorithm developed by Gatys et al. which can combine the content and style of two images. The main finding of the research of the authors is that the representations of style and content in the CNN are separable. Therefore, one can take the content of one image and combine it with the style of another image. To implement the algorithm we used the convolutional and pooling layers of the VGG-19 network. We then formulated and solved the minimization problem present on the paper using Theano. Finally, we replicated the results of the paper and experimented with new images and artists. We managed to create appealing images with an artistic style. For individuals interested in replicating the same results, we recommend to first work with small images (32x32) to assess fast whether the algorithm is working or not. Given the large number of weights which need to be updated, the algorithm takes a long time to run. We also advise to try different values for and to find the desired balance between style and content. Finally, we recommend using the L-BFGS algorithm, a quasi newton method, for solving the minimization problem. We tried Adam and regular steepest descent algorithms, but obtained much faster and better convergence with the L-BFGS algorithm. 7. Acknowledgement We would like to acknowledge the work done in [6], it was a very useful guide that helped us to figure out some implementation details. 8. References Include all references - papers, code, links, books. [1] Bitbucket repo: https://bitbucket.org/e_4040_ta/e4040_project_cjmd [2] ​Gatys, L. A., Ecker, A. S., & Bethge, M. (2015). A neural algorithm of artistic style. ​arXiv preprint arXiv:1508.06576. [3] ​Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Berg, A. C. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, ​115(3), 211-252. [4]​https://s3.amazonaws.com/lasagne/recipes/pretrai ned/imagenet/vgg19_normalized.pkl [5] ​Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In ​Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2414-2423). [6]https://github.com/Lasagne/Recipes/blob/master/ examples/styletransfer/Art%20Style%20Transfer.ipy nb [7]​http://www.slideshare.net/ckmarkohchang/applied -deep-learning-1103-convolutional-neural-networks [8] ​Liu, D. C., & Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Mathematical programming, ​45(1-3), 503-528. 9. Appendix 9.1 Individual student contributions ce2330 jb3852 jdr2162 Last Name Espino Borgnino Ramirez Fraction of (useful) total contribution 1/3 1/3 1/3 What I did 1 Implement the structure of the network. Implement the loss functions and the theano training model Implement the classes and fixed issues on the code. What I did 2 Methodolo gy and formulatio n Introduction, results and conclusions Implementation s and Software design What I did 3 Run third example Run first example Run second example