SlideShare a Scribd company logo
Multiple Style-Transfer in Real-Time
Michael Maring
University of Michigan
Ann Arbor, MI
mykl@umich.edu
Kaustav Chakraborty
University of Michigan
Ann Arbor, MI
vatsuak@umich.edu
Abstract
Style transfer aims to combine the content of one im-
age with the artistic style of another. It was discovered that
lower levels of convolutional networks captured style infor-
mation, while higher levels captures content information.
The original style transfer formulation used a weighted
combination of VGG-16 layer activations to achieve this
goal. Later, this was accomplished in real-time using a
feed-forward network to learn the optimal combination of
style and content features from the respective images. The
first aim of our project was to introduce a framework for
capturing the style from several images at once. We propose
a method that extends the original real-time style transfer
formulation by combining the features of several style im-
ages. This method successfully captures color information
from the separate style images. The other aim of our project
was to improve the temporal style continuity from frame to
frame. Accordingly, we have experimented with the tem-
poral stability of the output images and discussed the var-
ious available techniques that could be employed as alter-
natives.
1. Introduction
1.1. Style Transfer Background
Style transfer was introduced by Gatys et al [1] as a gen-
erative method to combine the loss from different layers of
a convolutional neural network trained on image recogni-
tion. The authors noticed that the filters contained in the
lower layers of the VGG-16 network captured information
that closely represented the style of an image. Similarly, the
filters contained in higher layers captured more abstact in-
formation that could be interpreted as image content. Their
original idea was to create a new image that minimized the
combination of style and content loss that they obtained
from the VGG-16 network.
The process in [1] was iterative and so could not be ac-
complished in real-time. Johnson et al [2] accomplished
real-time style transfer be introducing a feed forward con-
volutional neural network that learned the optimal combi-
nation of style and content. Once training was complete,
the feed-forward network could be used to directly styl-
ize a new image. This method was improved by Ulyanov
et al [3], who noticed that replacing the transformer net-
work’s batch normalization layers with instance normaliza-
tion layers improved the visual quality of the style transfer.
The intuition of why instance normalization improved upon
batch normalization was explained well in Huang et al [4].
They determined that, ”Each single sample [content image],
however, may still have different styles.” Each content im-
age within the training batch has its own inherent style, and
batch normalization of several content images would create
some incoherent mixture of these during training.
1.2. Multiple Style Transfer Motivation
The real-time style transfer implementation introduced
in Johnson et al [2] can only be trained for one style at a
time. This was our motivation behind trying to combine
the information from multiple style images. Concretely, the
first aim of this paper was to simultaneously minimize the
loss function of the content image and several style images.
1.3. Style Continuity Background
The effect of running the transformer-net as a stan-
dalone feed forward network(with no grad-computation) af-
ter training provided fast computation results [2]. How-
ever this process produced temporal inconsistencies. This
was a similar problem in the case of image segmentation
which required the output to be penalized based on the in-
correct pixel labelling with respect to its neighbours. [5]
was one of the first papers to incorporate the concept of a
conditional-random field (CRF) that could be trained end-
to-end in the form of an recurrent neural network (RNN).
It yielded better results than even post possessing the image
labels through a CRF using the predictions made by the neu-
ral network as a prior. This was seen as a possible solution
to reduce the temporal inconsistencies in the images. Yet
another method that could be used to reduce computational
1
complexity is the reduction of trainable parameters by either
compressing the bulky network into a much portable form
that is more suitable for deployment [6], or designing net-
works that perform convolution with lesser parameters or
less tunable weights [7, 8].These networks perform similar
to the original less-compact version network however the
computing cost is highly reduced and as a result the train-
ing time can be substantially reduced in-turn we speculate a
faster evaluation module.
1.4. Style Continuity Motivation
Like in every other CNN algorithm the strive in the com-
munity always remain to make the process computationally
inexpensive or in real time. This sometimes takes its toll
on the network’s accuracy or visual quality. We wanted to
make sure that the computation of the results is fast enough
and additionally, it should be stable with less jittery outputs.
This called for including a form of constraint between the
styling of two separate images in the sequence.
1.5. Related Work
1.5.1 Arbitrary Style Transfer
The most significant limitation of the real-time style transfer
implementation in Johnson et al [2] was a new transformer
network had to be trained for each style. There have been
several real-time style transfer methods that aim to create
a high-dimensional, learned representation of style, which
would allow them to incorporate this information into a
transformer network to output a content image in any de-
sired (”arbitrary”) style [9, 4, 10]. Dumoulin et al [9] dis-
covered that styles share many artistic qualities, and that this
similarity can be represented in a high dimensional embed-
ding space. They successfully created a high-dimensional
embedding space learned from 32 distinct artistic styles and
demonstrated the possibility of combining artistic styles by
combining their embedding features in an intelligent way.
Huang et al discovered that they were able to render and
image in an arbitrary style by aligning the mean and vari-
ance of the content and style images using what they called
Adaptive Instance Normalization, or AdaIn, layers. Lastly,
Ghiasi et al [10] trained a network to create an embed-
ding space for describing style images using approximately
80,000 patintings. They were able to show that their embed-
ding space grouped together similar styles and allow them
to render content images in unique styles that were not used
during training.
1.5.2 Style Continuity
One of the notable methods that has been used to preserve
the consistency of the images after their transformation is
the use of CRFs. Conditional random fields, apply a layer
of pairwise loss in addition to the per-pixel loss. The main
objective function that CRFs try to minimize is written in
the form of a Gibbs per pixel energy of the assignment as,
E(x) =
i
ψu(xi) +
i<j
ψp(xi.xj)
which is comprised of the unary component ψu(xi), that
accounts for the cost/loss of assigning a style (in our case)
xi to the pixel i. while the pair wise energy is computed by
the second term. [5] took this process one step further and
was abe to model the CRF into an RNN allowing the pa-
rameters to be learing while training. This was employed
by the using the mean field iteration technique which is
essentially a way to employ the KL divergence in a more
tractable manner. Another inspiring model was that of the
Unet [11]. This is again another example of how techniques
that are applied in the image segmentation domain can be
cross over to our application. The image is downsampled
to break it down to its essential representative features and
then is upsampled/reconstructed back up again. This is a
pretty similar approach that we take in the transformer net-
work albeit with a number of modifications. An interesting
feature of the UNet is the ”crop -and copy” functionality as
shown in Fig. 1 , where the output from previous layers are
concatenated with the outputs from subsequent layers. This
was thought of as a striking feature that could probably al-
low us to extend the concept of temporal pixel relationship
a bit further. Another such method that has been popular
Figure 1: Original Unet Formulation. The arrows going
across the network show the crop and copy concept.
in the context of accounting for temporal changes is Long-
Short-Term Memory(LSTM)[12]. This technique was in-
vented to be useful in handling time series data specially for
the case of vanishing gradient or exploding gradient which
was a common issue while training RNNs. Finally, reduc-
tion of computation could be trace to the reduction in the
number of parameters requiring to be trained. Iandola et al
[7] proposed the SqueezeNet where they have made use of
a FireModule to reduce the number of weights requiring to
be trained. Here the authors achieve a network as efficient
as the famous AlexNet however using 500 times lesser pa-
rameters. These FireModules replace the 3x3 convolutions
with 1x1 convolutons. In our work we have tried to take
some features from FireModule and the UNet architecture
to reduce the computation time of training of our network.
2. Methods
2.1. Combining Style and Content
The original formulation of style transfer proposed by
Gatys et al [1] described the loss function as a weighted
combination of content loss and style loss. That is,
Ltotal = δcontent · Lcontent + δstyle · Lstyle (1)
The VGG-16 network that was used to define the percep-
tual losses is composed of five convolutional blocks sepa-
rated by maxpool layers. The activations used to define the
content and style losses were taken from the ReLU layer im-
mediately preceding the maxpool layers because they rep-
resented the activation at each spatial scale in the VGG net-
work. The content loss is given by the following,
Lc = ||φl(ˆy) − φ(yc)||2 (2)
where φl() is the activation of the ReLU layer from the lth
convolutional block of the VGG-16 network (used to de-
fine perceptual losses), ˆy is the output image of our trans-
former network, and yc is the content image. This loss is
unchanged in our implementation. We used the output of
the second convolutional block (i.e. l = 2) to define our
content loss as done in [2].
The style loss is given by the following,
Ls = ||
l
(Gl(ˆy) − Gl(ys))||2
F (3)
where Gl() is the gram matrix in [1, 2] and ys is the style
image. The gram matrix is the inner product of the filters
for a particular style layer.
Gl,(i,j) = Fl,i · Fl,j (4)
The (i, j) position of the gram matrix for layer l is given
by the dot product between the ith
filter, Fl,i, and jth
fil-
ter, Fl,j. If there are several style layers, the overall style
loss is the summation of each style layer loss. We used
the activations from convolutional blocks one through four,
l = {1, 2, 3, 4}, to minimize the style loss across several
spatial scales as done in [2].
2.2. Combining Multiple Styles
In order to combine several styles, the natural extension
of the original method was to incorporate the weighted sum
of losses from multiple style images. This is given by the
following,
Ls =
1
n
(Ls1
+ Ls2
+ . . . Lsn
) (5)
where Ls1
is the loss described above in Eq. 3 for style
image one. This was the new objective function that we
used to combine the features from several style images.
Further, it is possible to include a weighted sum of each
style image. For example, if there were two style images,
Ls =
1
n
(α · Ls1 + (1 − α) · Ls2 ) (6)
This would allow us to vary the contribution of each style
image to the overall transfer; effectively interpolating be-
tween styles.
The hyperparameters used for training are summarized
in Table 1. Unless otherwise stated, these are the parameters
that we used for training.
Parameter
Content Weight 1e5
Style Weight 1e10
Content Layers 2
Style Layers 1-4
Epochs 1
batch size 4
Learning Rate 1e-3
Optimizer Adam
Table 1: Summary of Hyperparameters
2.3. Dataset
Our network was trained using the Common Objects in
COtext, or COCO 1
, dataset. More specifically we used the
2014 training images. This dataset contains 83,000 images
of everyday objects such as food, people, household furni-
ture. While this dataset contains segmentation information,
it is not useful for this application. We needed a large corpus
of images so that our transformer network can successfully
apply the style in a variety of circumstances.
2.4. Transformer-Net
The transformer net is one the main features of [2]. The
main objective of the network is to act as the feed-forward
network that can quickly compute the optimization for the
style transfer. [3] showed that style transfer performed even
better when the batch normalization is replaced by instance
normalization. The detailed network is shown in Fig. 3. It is
a typical encoder decoder network as show in the represen-
tative network in the left bottom. First there are a number of
1http://cocodataset.org/#home
Figure 2: This depicts a style image rendered for styles 1 and 2. The last column shows the content image rendered using
our method for combining multiple styles. This combination rendering captures some color information from both styles but
very little, if any, texture from either style.
Figure 3: The transform net used in our pipeline. Top-
shows all the layers. Bottomleft shows the representative
”hourglass” structure. Bottom right is the key.
downsamplling steps which is followed by a bottleneck and
futher by the umsampling process. Most of the convolu-
tions are standard 3x3 or 9x9 kernels with strides of either 1
or 2 as is the case for upsampling or downsampling. ReLU
is used as the activation function. We have used reflection
padding in our network to minimize artifacts near the edge
caused by zero-padding. We now focus on a few impor-
tant parts in the network, namely the instance norm and the
ResNet.
2.4.1 Instance Norm
The main objective of normalization is to reduce the so
called covariance shift that resets the data to a zero mean
and unit covariance every time it is fed to a new layer. This
allows for the layer to learn more independently and also
allows for faster convergence. Batch normalization is com-
puted over the whole batch of the image while the instance
norm is the one that is computed only over a specific in-
stance in the input. This can be represented as,
ytijk =
xtijk − µti
σ2
ti +
, µti =
1
HW
W
i=1
H
m=1
xtilm
σ2
ti =
1
HW
W
i=1
H
m=1
(xtijk − µti)
2
where the t,i,j,k is are the tensor, channel, abscissa and or-
dinate indices respectively. The σ2
is the instance variance
while µ is the instance mean respectively, while y is the out-
put.This choice of normalization prevents the instance spe-
cific mean and co-variance shift over simplifying the learn-
ing procedure. Instance-norm is used both during the train-
ing as well as during the testing.
2.4.2 ResNet
ResNets [13] were discovered as one of the processes to
reduce the problem of vanishing gradient allowing us to
train much deeper networks. ResNets essentially allow us
to easily learn the identity function and make sure that the
losses are not saturated over the length of the network. Our
network uses 5 ResNet blocks inside the bottleneck region.
This allows us to effectively train the 20-layer network. [13]
2.5. Correlation of the features
2.6. Style Consistency
Figure 4: The modified Transform net used in our pipeline.
Fig. 4 is the modified version of the transformer net
that we have experimented our style consistency approaches
on. This method tries to combine the novelties of both the
UNet[11] as well as that of the FireModule[7]. We will just
resize and copy the frame that is given as the input(which
will act as the stylized image at time t) to the transformer
network and merge it with the output of the transformer net-
work. This achieved by the 1x1 convolutions that has been
so effective in the case of Firemodules. The process is rel-
atively fast since it does not have to many parameters and
it just has to learn an identity function just like the ResNet
which additionally helps us even in the case of vanishing
gradients.
3. Results and Discussion
3.1. Combining Multiple Styles
The first experiment we performed was to combine the
style from two paintings that were visually very different.
The thought process behind this experiment was that the
output should show very different styles and the outcome
would be obvious. These two paintings were Weeping
Woman by Picasso (Style 1) and Woman With a Parasol by
Monet (Style 2). The stylized images are shown in Fig. 2.
The resultant stylization was barely able to capture the color
information of the two styles. The style loss dominated the
total loss during training for combining both styles (Fig. 5),
which was indicative of the inability to combine the high-
level perceptual features from the VGG network. For ex-
Figure 5: Training losses for one style (Picasso) and both
styles (Picasso and Monet). The style loss dominated the
total loss for combining both styles. This change can be
attributed to the failure to rectify the high-level features for
both styles.
ample, the style for training one style was around the same
magnitude as the content loss (Fig. 5).
Next, we experimented with the style weighting to see if
we could resolve the high style loss issue. We thought that
if we increased the style weighting, that the network would
learn how to better resolve the discrepancies between the
style features of both paintings. The stylizations with sev-
eral different style weightings are shown Fig. 7 (default
style weight = 1e10). We were not able to achieve a styl-
ization of the content image that successfully combined the
color and texture from both styles. In fact, when the style
weight was increased, triangular-shaped artifacts started to
appear within the frame (style weight = 1e11). Therefore,
we decided to test our method on two paintings that were
visually more similar.
The original experiment was reran with two new paint-
ings. The two paintings were the Scream by Munch (Style
1) and Starry Night by Van Gogh (Style 2). The stylization
of these images is shown in Fig. 8. The resulting styliza-
tions successfully combined the colors of the two paintings
as before, but also included small textures such as the ”sun-
like” objects from Starry Night. This improvement was re-
flected in the lower style loss for ”both styles” in Fig. 6 as
compared to Fig. 5.
We also tested the ability of our network to interpolate
between styles using Eq. 6. Our results for style inter-
Figure 6: Convergence of total, style, and content losses
during training for similar styles (Scream and Starry Night).
For the default style weight (1e10), the style loss was much
less than for two perceptually very different paintings Fig.
5. This indicated that the network was able to better com-
bine high-level features from both paintings.
.
polation are displayed in Fig. 9. It is clear that α = 0.7
incorporates more color from style 1 while α = 0.3 incor-
porates more color from style 2. Further, on the door frame
of α = 0.7, there are waves reminiscent of the textures from
style 1. Similarly, for α = 0.3, there are more ”sun-like”
features that are reminiscent of style 2.
While our method for combining multiple styles was
successful in capturing details from two perceptually sim-
ilar paintings, it failed to rectify differences between two
dissimilar paintings. This was likely due to the inability of
the high-level features used in VGG network to describe
the ”style” in these two paintings in a meaningful way.
This provides motivation to use methods described in the
related works section where they created a network to se-
mantically describe paintings in a higher dimensional space
[10]. Further, they papers were able to train their network
on many styles, rather than just two in our case. Any of
these methods may serve as an inspiration to improve upon
our method.
3.2. Style Consistancy
In order to check our network for style consistencies we
fed the network a stream of the same images after training.
Any change in the style element in the output would be con-
sidered a not a consistent result. Fig 10 shows the result of
how our architecture holds up. Fig 10a shows the output
of the original image used as the base line Fig 10b shows
the output without any from of correlation performed while
Fig 10c shows the result of the correlated network. Each
of the images show a zoomed in portion of their top left
area in order to display and compare the artifacts properly.
It can be seen that Fig 10a and Fig 10c shows pretty sim-
ilar results while Fig 10b artifacts deviate a lot. By this
we say that the output of the first and the third images are
consistent and they do not vary with time. We can loosely
claim that the style of our network completely depend on
the ”local-content” that it is focused on. If the local-content
remains unchanged so do the style. This can further be used
a shallow example of time independency of our network.
However, it must be declared that the network is really in-
consistent. Some of the outputs that we received were not
even images, they were just random number. Sometimes it
was just a white color and no content. This was one of the
very few correct outputs that we had received. The
4. Future Work
Another method we thought of that may be able to com-
bine style and content information is based upon the work
of [6]. This ”distillation” method was intended to combine
the information learned from one large network or an en-
semble of networks (”learned network”) into one, possible
smaller, network (”new” network). Within this paradigm,
the ”learned” networks would be the transformer networks
trained on fixed individual styles and the ”new” network
would be the transformer network to combine several styles.
For the style consistencies, a lot of research has to be done
to figure out why the outputs are not uniform. Addition-
ally, implementation of the LSTM maybe considered as a
long term goal or a CRF-RNN network to better preserve
consistencies.
4.1. Original Contributions
Our first contribution was the idea to combine multiple
styles in one training. Secondly, we developed a method
to interpolate between styles. Concretely, we coded from
scratch the transformer network and function to return the
VGG network activations. The transformer network struc-
ture we used was the same as Johnson et al [2] 2
. Their sup-
plementary material was a useful resource to determine the
exact structure of their transformer network. The training
script used to train our transformer network was a heavily
modified version of the script from the Pytorch examples
for real-time style transfer 3
. In particular, we altered the
script to mesh with our implementation of the VGG loss
2https://github.com/jcjohnson/fast-neural-style
3https://github.com/pytorch/examples/tree/
master/fast_neural_style
Figure 7: This illustrates stylizations similar to Fig. 2, but with different style loss weightings. There is a clear trend that as
the style weight increases, the content image is rendered more abstractly. Further, for style loss weighting 1e11, triangular-
shaped artifacts start appearing.
Figure 8: This depicts a style image rendered for styles 1 and 2. The last column shows the content image rendered using
our method for combining multiple styles. This combination rendering captures color information from both styles as well
as some textures. For example, there are ”sun-like” patterns that mimic Style 2.
Figure 9: This demonstrates the ability to interpolate between styles based upon different weighted contributions to the total
style loss as described in Eq. 6. There is a clear transition from more orange on the left to more blue on the right; mimicking
the style transition from Scream (style 1) to Starry Night (style 2).
Figure 10: Image Consistency results.
and transformer network. We also altered the script to al-
low for our multiple styles method and style interpolation
method. Additionally, we altered it to save training loss,
plot training loss (during training), and save a log file of the
hyperparameters used. We created bash utility scripts to al-
low us to train several networks and stylize several images
all at once. We created python scripts using OpenCV to
stylize images from a video file and render images from a
webcam in real time using our trained models. Every styl-
ized image presented in the results section is from a model
we trained using our code.
References
[1] Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge.
A neural algorithm of artistic style. CoRR, abs/1508.06576,
2015. 1, 3
[2] Justin Johnson, Alexandre Alahi, and Fei-Fei Li. Percep-
tual losses for real-time style transfer and super-resolution.
CoRR, abs/1603.08155, 2016. 1, 2, 3, 6
[3] Dmitry Ulyanov, Andrea Vedaldi, and Victor S. Lempitsky.
Instance normalization: The missing ingredient for fast styl-
ization. CoRR, abs/1607.08022, 2016. 1, 3
[4] Xun Huang and Serge J. Belongie. Arbitrary style trans-
fer in real-time with adaptive instance normalization. CoRR,
abs/1703.06868, 2017. 1, 2
[5] Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-
Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang
Huang, and Philip HS Torr. Conditional random fields as
recurrent neural networks. In Proceedings of the IEEE inter-
national conference on computer vision, pages 1529–1537,
2015. 1, 2
[6] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distill-
ing the knowledge in a neural network. arXiv preprint
arXiv:1503.02531, 2015. 2, 6
[7] Forrest N Iandola, Song Han, Matthew W Moskewicz,
Khalid Ashraf, William J Dally, and Kurt Keutzer.
Squeezenet: Alexnet-level accuracy with 50x fewer pa-
rameters and¡ 0.5 mb model size. arXiv preprint
arXiv:1602.07360, 2016. 2, 5
[8] Chai Wah Wu. Prodsumnet: reducing model parameters in
deep neural networks via product-of-sums matrix decompo-
sitions. arXiv preprint arXiv:1809.02209, 2018. 2
[9] Vincent Dumoulin, Jonathon Shlens, and Manjunath Kud-
lur. A learned representation for artistic style. CoRR,
abs/1610.07629, 2016. 2
[10] Golnaz Ghiasi, Honglak Lee, Manjunath Kudlur, Vincent
Dumoulin, and Jonathon Shlens. Exploring the structure of a
real-time, arbitrary neural artistic stylization network. CoRR,
abs/1705.06830, 2017. 2, 6
[11] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-
net: Convolutional networks for biomedical image segmen-
tation. In International Conference on Medical image com-
puting and computer-assisted intervention, pages 234–241.
Springer, 2015. 2, 5
[12] Sepp Hochreiter and J¨urgen Schmidhuber. Long short-term
memory. Neural computation, 9(8):1735–1780, 1997. 2
[13] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
Deep residual learning for image recognition. In Proceed-
ings of the IEEE conference on computer vision and pattern
recognition, pages 770–778, 2016. 5

More Related Content

What's hot

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
taeseon ryu
 
Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback  Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback
dannyijwest
 
Content Based Image Retrieval Using 2-D Discrete Wavelet Transform
Content Based Image Retrieval Using 2-D Discrete Wavelet TransformContent Based Image Retrieval Using 2-D Discrete Wavelet Transform
Content Based Image Retrieval Using 2-D Discrete Wavelet Transform
IOSR Journals
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MLAI2
 
Bj31416421
Bj31416421Bj31416421
Bj31416421
IJERA Editor
 
CNN
CNNCNN
Visualizaing and understanding convolutional networks
Visualizaing and understanding convolutional networksVisualizaing and understanding convolutional networks
Visualizaing and understanding convolutional networks
SungminYou
 
Ijcatr04021016
Ijcatr04021016Ijcatr04021016
Ijcatr04021016
Editor IJCATR
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
SungminYou
 
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
ijcsit
 
Performance analysis of Hybrid Transform, Hybrid Wavelet and Multi-Resolutio...
Performance analysis of Hybrid Transform, Hybrid Wavelet and  Multi-Resolutio...Performance analysis of Hybrid Transform, Hybrid Wavelet and  Multi-Resolutio...
Performance analysis of Hybrid Transform, Hybrid Wavelet and Multi-Resolutio...
IJMER
 
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
CSCJournals
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentation
Gioele Ciaparrone
 
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
IOSR Journals
 
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual LearningSequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
MLAI2
 
A beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsA beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trends
JaeJun Yoo
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
Jeremy Nixon
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
SungminYou
 
A genetic algorithm to solve the
A genetic algorithm to solve theA genetic algorithm to solve the
A genetic algorithm to solve the
IJCNCJournal
 
Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)
seungwoo kim
 

What's hot (20)

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 
Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback  Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback
 
Content Based Image Retrieval Using 2-D Discrete Wavelet Transform
Content Based Image Retrieval Using 2-D Discrete Wavelet TransformContent Based Image Retrieval Using 2-D Discrete Wavelet Transform
Content Based Image Retrieval Using 2-D Discrete Wavelet Transform
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
 
Bj31416421
Bj31416421Bj31416421
Bj31416421
 
CNN
CNNCNN
CNN
 
Visualizaing and understanding convolutional networks
Visualizaing and understanding convolutional networksVisualizaing and understanding convolutional networks
Visualizaing and understanding convolutional networks
 
Ijcatr04021016
Ijcatr04021016Ijcatr04021016
Ijcatr04021016
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
 
Performance analysis of Hybrid Transform, Hybrid Wavelet and Multi-Resolutio...
Performance analysis of Hybrid Transform, Hybrid Wavelet and  Multi-Resolutio...Performance analysis of Hybrid Transform, Hybrid Wavelet and  Multi-Resolutio...
Performance analysis of Hybrid Transform, Hybrid Wavelet and Multi-Resolutio...
 
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentation
 
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
 
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual LearningSequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
 
A beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsA beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trends
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
 
A genetic algorithm to solve the
A genetic algorithm to solve theA genetic algorithm to solve the
A genetic algorithm to solve the
 
Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)
 

Similar to Multiple Style-Transfer in Real-Time

Neural Style Transfer in practice
Neural Style Transfer in practiceNeural Style Transfer in practice
Neural Style Transfer in practice
MohamedAmineHACHICHA1
 
Neural Style Transfer in Practice
Neural Style Transfer in PracticeNeural Style Transfer in Practice
Neural Style Transfer in Practice
KhalilBergaoui
 
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Sitakanta Mishra
 
M1803016973
M1803016973M1803016973
M1803016973
IOSR Journals
 
Video to Video Translation CGAN
Video to Video Translation CGANVideo to Video Translation CGAN
Video to Video Translation CGAN
Alessandro Calmanovici
 
Implementing Neural Style Transfer
Implementing Neural Style Transfer Implementing Neural Style Transfer
Implementing Neural Style Transfer
Tahsin Mayeesha
 
A Learned Representation for Artistic Style
A Learned Representation for Artistic StyleA Learned Representation for Artistic Style
A Learned Representation for Artistic Style
Mayank Agarwal
 
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkRunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
Putra Wanda
 
Beginner's Guide to Diffusion Models..pptx
Beginner's Guide to Diffusion Models..pptxBeginner's Guide to Diffusion Models..pptx
Beginner's Guide to Diffusion Models..pptx
Ishaq Khan
 
IRJET- Concepts, Methods and Applications of Neural Style Transfer: A Rev...
IRJET-  	  Concepts, Methods and Applications of Neural Style Transfer: A Rev...IRJET-  	  Concepts, Methods and Applications of Neural Style Transfer: A Rev...
IRJET- Concepts, Methods and Applications of Neural Style Transfer: A Rev...
IRJET Journal
 
RGB Image Compression using Two-dimensional Discrete Cosine Transform
RGB Image Compression using Two-dimensional Discrete Cosine TransformRGB Image Compression using Two-dimensional Discrete Cosine Transform
RGB Image Compression using Two-dimensional Discrete Cosine Transform
Harshal Ladhe
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learning
ijtsrd
 
Image resolution enhancement via multi surface fitting
Image resolution enhancement via multi surface fittingImage resolution enhancement via multi surface fitting
Image resolution enhancement via multi surface fitting
International Journal of Science and Research (IJSR)
 
Todtree
TodtreeTodtree
Todtree
Manasa Prasad
 
Optimizer algorithms and convolutional neural networks for text classification
Optimizer algorithms and convolutional neural networks for text classificationOptimizer algorithms and convolutional neural networks for text classification
Optimizer algorithms and convolutional neural networks for text classification
IAESIJAI
 
A Testbed for Image Synthesis
A Testbed for Image SynthesisA Testbed for Image Synthesis
A Testbed for Image Synthesis
Ben Trumbore
 
Learning a Recurrent Visual Representation for Image Caption G
Learning a Recurrent Visual Representation for Image Caption GLearning a Recurrent Visual Representation for Image Caption G
Learning a Recurrent Visual Representation for Image Caption G
JospehStull43
 
Learning a Recurrent Visual Representation for Image Caption G.docx
Learning a Recurrent Visual Representation for Image Caption G.docxLearning a Recurrent Visual Representation for Image Caption G.docx
Learning a Recurrent Visual Representation for Image Caption G.docx
croysierkathey
 
HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES
HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES
HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES
sipij
 
U4408108113
U4408108113U4408108113
U4408108113
IJERA Editor
 

Similar to Multiple Style-Transfer in Real-Time (20)

Neural Style Transfer in practice
Neural Style Transfer in practiceNeural Style Transfer in practice
Neural Style Transfer in practice
 
Neural Style Transfer in Practice
Neural Style Transfer in PracticeNeural Style Transfer in Practice
Neural Style Transfer in Practice
 
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
 
M1803016973
M1803016973M1803016973
M1803016973
 
Video to Video Translation CGAN
Video to Video Translation CGANVideo to Video Translation CGAN
Video to Video Translation CGAN
 
Implementing Neural Style Transfer
Implementing Neural Style Transfer Implementing Neural Style Transfer
Implementing Neural Style Transfer
 
A Learned Representation for Artistic Style
A Learned Representation for Artistic StyleA Learned Representation for Artistic Style
A Learned Representation for Artistic Style
 
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkRunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
 
Beginner's Guide to Diffusion Models..pptx
Beginner's Guide to Diffusion Models..pptxBeginner's Guide to Diffusion Models..pptx
Beginner's Guide to Diffusion Models..pptx
 
IRJET- Concepts, Methods and Applications of Neural Style Transfer: A Rev...
IRJET-  	  Concepts, Methods and Applications of Neural Style Transfer: A Rev...IRJET-  	  Concepts, Methods and Applications of Neural Style Transfer: A Rev...
IRJET- Concepts, Methods and Applications of Neural Style Transfer: A Rev...
 
RGB Image Compression using Two-dimensional Discrete Cosine Transform
RGB Image Compression using Two-dimensional Discrete Cosine TransformRGB Image Compression using Two-dimensional Discrete Cosine Transform
RGB Image Compression using Two-dimensional Discrete Cosine Transform
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learning
 
Image resolution enhancement via multi surface fitting
Image resolution enhancement via multi surface fittingImage resolution enhancement via multi surface fitting
Image resolution enhancement via multi surface fitting
 
Todtree
TodtreeTodtree
Todtree
 
Optimizer algorithms and convolutional neural networks for text classification
Optimizer algorithms and convolutional neural networks for text classificationOptimizer algorithms and convolutional neural networks for text classification
Optimizer algorithms and convolutional neural networks for text classification
 
A Testbed for Image Synthesis
A Testbed for Image SynthesisA Testbed for Image Synthesis
A Testbed for Image Synthesis
 
Learning a Recurrent Visual Representation for Image Caption G
Learning a Recurrent Visual Representation for Image Caption GLearning a Recurrent Visual Representation for Image Caption G
Learning a Recurrent Visual Representation for Image Caption G
 
Learning a Recurrent Visual Representation for Image Caption G.docx
Learning a Recurrent Visual Representation for Image Caption G.docxLearning a Recurrent Visual Representation for Image Caption G.docx
Learning a Recurrent Visual Representation for Image Caption G.docx
 
HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES
HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES
HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES
 
U4408108113
U4408108113U4408108113
U4408108113
 

Recently uploaded

Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
mahammadsalmanmech
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
ihlasbinance2003
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
ssuser36d3051
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
Divyam548318
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
gerogepatton
 
2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt
PuktoonEngr
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
rpskprasana
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
yokeleetan1
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
Mukeshwaran Balu
 

Recently uploaded (20)

Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
 

Multiple Style-Transfer in Real-Time

  • 1. Multiple Style-Transfer in Real-Time Michael Maring University of Michigan Ann Arbor, MI mykl@umich.edu Kaustav Chakraborty University of Michigan Ann Arbor, MI vatsuak@umich.edu Abstract Style transfer aims to combine the content of one im- age with the artistic style of another. It was discovered that lower levels of convolutional networks captured style infor- mation, while higher levels captures content information. The original style transfer formulation used a weighted combination of VGG-16 layer activations to achieve this goal. Later, this was accomplished in real-time using a feed-forward network to learn the optimal combination of style and content features from the respective images. The first aim of our project was to introduce a framework for capturing the style from several images at once. We propose a method that extends the original real-time style transfer formulation by combining the features of several style im- ages. This method successfully captures color information from the separate style images. The other aim of our project was to improve the temporal style continuity from frame to frame. Accordingly, we have experimented with the tem- poral stability of the output images and discussed the var- ious available techniques that could be employed as alter- natives. 1. Introduction 1.1. Style Transfer Background Style transfer was introduced by Gatys et al [1] as a gen- erative method to combine the loss from different layers of a convolutional neural network trained on image recogni- tion. The authors noticed that the filters contained in the lower layers of the VGG-16 network captured information that closely represented the style of an image. Similarly, the filters contained in higher layers captured more abstact in- formation that could be interpreted as image content. Their original idea was to create a new image that minimized the combination of style and content loss that they obtained from the VGG-16 network. The process in [1] was iterative and so could not be ac- complished in real-time. Johnson et al [2] accomplished real-time style transfer be introducing a feed forward con- volutional neural network that learned the optimal combi- nation of style and content. Once training was complete, the feed-forward network could be used to directly styl- ize a new image. This method was improved by Ulyanov et al [3], who noticed that replacing the transformer net- work’s batch normalization layers with instance normaliza- tion layers improved the visual quality of the style transfer. The intuition of why instance normalization improved upon batch normalization was explained well in Huang et al [4]. They determined that, ”Each single sample [content image], however, may still have different styles.” Each content im- age within the training batch has its own inherent style, and batch normalization of several content images would create some incoherent mixture of these during training. 1.2. Multiple Style Transfer Motivation The real-time style transfer implementation introduced in Johnson et al [2] can only be trained for one style at a time. This was our motivation behind trying to combine the information from multiple style images. Concretely, the first aim of this paper was to simultaneously minimize the loss function of the content image and several style images. 1.3. Style Continuity Background The effect of running the transformer-net as a stan- dalone feed forward network(with no grad-computation) af- ter training provided fast computation results [2]. How- ever this process produced temporal inconsistencies. This was a similar problem in the case of image segmentation which required the output to be penalized based on the in- correct pixel labelling with respect to its neighbours. [5] was one of the first papers to incorporate the concept of a conditional-random field (CRF) that could be trained end- to-end in the form of an recurrent neural network (RNN). It yielded better results than even post possessing the image labels through a CRF using the predictions made by the neu- ral network as a prior. This was seen as a possible solution to reduce the temporal inconsistencies in the images. Yet another method that could be used to reduce computational 1
  • 2. complexity is the reduction of trainable parameters by either compressing the bulky network into a much portable form that is more suitable for deployment [6], or designing net- works that perform convolution with lesser parameters or less tunable weights [7, 8].These networks perform similar to the original less-compact version network however the computing cost is highly reduced and as a result the train- ing time can be substantially reduced in-turn we speculate a faster evaluation module. 1.4. Style Continuity Motivation Like in every other CNN algorithm the strive in the com- munity always remain to make the process computationally inexpensive or in real time. This sometimes takes its toll on the network’s accuracy or visual quality. We wanted to make sure that the computation of the results is fast enough and additionally, it should be stable with less jittery outputs. This called for including a form of constraint between the styling of two separate images in the sequence. 1.5. Related Work 1.5.1 Arbitrary Style Transfer The most significant limitation of the real-time style transfer implementation in Johnson et al [2] was a new transformer network had to be trained for each style. There have been several real-time style transfer methods that aim to create a high-dimensional, learned representation of style, which would allow them to incorporate this information into a transformer network to output a content image in any de- sired (”arbitrary”) style [9, 4, 10]. Dumoulin et al [9] dis- covered that styles share many artistic qualities, and that this similarity can be represented in a high dimensional embed- ding space. They successfully created a high-dimensional embedding space learned from 32 distinct artistic styles and demonstrated the possibility of combining artistic styles by combining their embedding features in an intelligent way. Huang et al discovered that they were able to render and image in an arbitrary style by aligning the mean and vari- ance of the content and style images using what they called Adaptive Instance Normalization, or AdaIn, layers. Lastly, Ghiasi et al [10] trained a network to create an embed- ding space for describing style images using approximately 80,000 patintings. They were able to show that their embed- ding space grouped together similar styles and allow them to render content images in unique styles that were not used during training. 1.5.2 Style Continuity One of the notable methods that has been used to preserve the consistency of the images after their transformation is the use of CRFs. Conditional random fields, apply a layer of pairwise loss in addition to the per-pixel loss. The main objective function that CRFs try to minimize is written in the form of a Gibbs per pixel energy of the assignment as, E(x) = i ψu(xi) + i<j ψp(xi.xj) which is comprised of the unary component ψu(xi), that accounts for the cost/loss of assigning a style (in our case) xi to the pixel i. while the pair wise energy is computed by the second term. [5] took this process one step further and was abe to model the CRF into an RNN allowing the pa- rameters to be learing while training. This was employed by the using the mean field iteration technique which is essentially a way to employ the KL divergence in a more tractable manner. Another inspiring model was that of the Unet [11]. This is again another example of how techniques that are applied in the image segmentation domain can be cross over to our application. The image is downsampled to break it down to its essential representative features and then is upsampled/reconstructed back up again. This is a pretty similar approach that we take in the transformer net- work albeit with a number of modifications. An interesting feature of the UNet is the ”crop -and copy” functionality as shown in Fig. 1 , where the output from previous layers are concatenated with the outputs from subsequent layers. This was thought of as a striking feature that could probably al- low us to extend the concept of temporal pixel relationship a bit further. Another such method that has been popular Figure 1: Original Unet Formulation. The arrows going across the network show the crop and copy concept. in the context of accounting for temporal changes is Long- Short-Term Memory(LSTM)[12]. This technique was in- vented to be useful in handling time series data specially for the case of vanishing gradient or exploding gradient which was a common issue while training RNNs. Finally, reduc- tion of computation could be trace to the reduction in the number of parameters requiring to be trained. Iandola et al [7] proposed the SqueezeNet where they have made use of
  • 3. a FireModule to reduce the number of weights requiring to be trained. Here the authors achieve a network as efficient as the famous AlexNet however using 500 times lesser pa- rameters. These FireModules replace the 3x3 convolutions with 1x1 convolutons. In our work we have tried to take some features from FireModule and the UNet architecture to reduce the computation time of training of our network. 2. Methods 2.1. Combining Style and Content The original formulation of style transfer proposed by Gatys et al [1] described the loss function as a weighted combination of content loss and style loss. That is, Ltotal = δcontent · Lcontent + δstyle · Lstyle (1) The VGG-16 network that was used to define the percep- tual losses is composed of five convolutional blocks sepa- rated by maxpool layers. The activations used to define the content and style losses were taken from the ReLU layer im- mediately preceding the maxpool layers because they rep- resented the activation at each spatial scale in the VGG net- work. The content loss is given by the following, Lc = ||φl(ˆy) − φ(yc)||2 (2) where φl() is the activation of the ReLU layer from the lth convolutional block of the VGG-16 network (used to de- fine perceptual losses), ˆy is the output image of our trans- former network, and yc is the content image. This loss is unchanged in our implementation. We used the output of the second convolutional block (i.e. l = 2) to define our content loss as done in [2]. The style loss is given by the following, Ls = || l (Gl(ˆy) − Gl(ys))||2 F (3) where Gl() is the gram matrix in [1, 2] and ys is the style image. The gram matrix is the inner product of the filters for a particular style layer. Gl,(i,j) = Fl,i · Fl,j (4) The (i, j) position of the gram matrix for layer l is given by the dot product between the ith filter, Fl,i, and jth fil- ter, Fl,j. If there are several style layers, the overall style loss is the summation of each style layer loss. We used the activations from convolutional blocks one through four, l = {1, 2, 3, 4}, to minimize the style loss across several spatial scales as done in [2]. 2.2. Combining Multiple Styles In order to combine several styles, the natural extension of the original method was to incorporate the weighted sum of losses from multiple style images. This is given by the following, Ls = 1 n (Ls1 + Ls2 + . . . Lsn ) (5) where Ls1 is the loss described above in Eq. 3 for style image one. This was the new objective function that we used to combine the features from several style images. Further, it is possible to include a weighted sum of each style image. For example, if there were two style images, Ls = 1 n (α · Ls1 + (1 − α) · Ls2 ) (6) This would allow us to vary the contribution of each style image to the overall transfer; effectively interpolating be- tween styles. The hyperparameters used for training are summarized in Table 1. Unless otherwise stated, these are the parameters that we used for training. Parameter Content Weight 1e5 Style Weight 1e10 Content Layers 2 Style Layers 1-4 Epochs 1 batch size 4 Learning Rate 1e-3 Optimizer Adam Table 1: Summary of Hyperparameters 2.3. Dataset Our network was trained using the Common Objects in COtext, or COCO 1 , dataset. More specifically we used the 2014 training images. This dataset contains 83,000 images of everyday objects such as food, people, household furni- ture. While this dataset contains segmentation information, it is not useful for this application. We needed a large corpus of images so that our transformer network can successfully apply the style in a variety of circumstances. 2.4. Transformer-Net The transformer net is one the main features of [2]. The main objective of the network is to act as the feed-forward network that can quickly compute the optimization for the style transfer. [3] showed that style transfer performed even better when the batch normalization is replaced by instance normalization. The detailed network is shown in Fig. 3. It is a typical encoder decoder network as show in the represen- tative network in the left bottom. First there are a number of 1http://cocodataset.org/#home
  • 4. Figure 2: This depicts a style image rendered for styles 1 and 2. The last column shows the content image rendered using our method for combining multiple styles. This combination rendering captures some color information from both styles but very little, if any, texture from either style. Figure 3: The transform net used in our pipeline. Top- shows all the layers. Bottomleft shows the representative ”hourglass” structure. Bottom right is the key. downsamplling steps which is followed by a bottleneck and futher by the umsampling process. Most of the convolu- tions are standard 3x3 or 9x9 kernels with strides of either 1 or 2 as is the case for upsampling or downsampling. ReLU is used as the activation function. We have used reflection padding in our network to minimize artifacts near the edge caused by zero-padding. We now focus on a few impor- tant parts in the network, namely the instance norm and the ResNet. 2.4.1 Instance Norm The main objective of normalization is to reduce the so called covariance shift that resets the data to a zero mean and unit covariance every time it is fed to a new layer. This allows for the layer to learn more independently and also allows for faster convergence. Batch normalization is com- puted over the whole batch of the image while the instance norm is the one that is computed only over a specific in- stance in the input. This can be represented as, ytijk = xtijk − µti σ2 ti + , µti = 1 HW W i=1 H m=1 xtilm σ2 ti = 1 HW W i=1 H m=1 (xtijk − µti) 2 where the t,i,j,k is are the tensor, channel, abscissa and or- dinate indices respectively. The σ2 is the instance variance while µ is the instance mean respectively, while y is the out- put.This choice of normalization prevents the instance spe- cific mean and co-variance shift over simplifying the learn- ing procedure. Instance-norm is used both during the train- ing as well as during the testing.
  • 5. 2.4.2 ResNet ResNets [13] were discovered as one of the processes to reduce the problem of vanishing gradient allowing us to train much deeper networks. ResNets essentially allow us to easily learn the identity function and make sure that the losses are not saturated over the length of the network. Our network uses 5 ResNet blocks inside the bottleneck region. This allows us to effectively train the 20-layer network. [13] 2.5. Correlation of the features 2.6. Style Consistency Figure 4: The modified Transform net used in our pipeline. Fig. 4 is the modified version of the transformer net that we have experimented our style consistency approaches on. This method tries to combine the novelties of both the UNet[11] as well as that of the FireModule[7]. We will just resize and copy the frame that is given as the input(which will act as the stylized image at time t) to the transformer network and merge it with the output of the transformer net- work. This achieved by the 1x1 convolutions that has been so effective in the case of Firemodules. The process is rel- atively fast since it does not have to many parameters and it just has to learn an identity function just like the ResNet which additionally helps us even in the case of vanishing gradients. 3. Results and Discussion 3.1. Combining Multiple Styles The first experiment we performed was to combine the style from two paintings that were visually very different. The thought process behind this experiment was that the output should show very different styles and the outcome would be obvious. These two paintings were Weeping Woman by Picasso (Style 1) and Woman With a Parasol by Monet (Style 2). The stylized images are shown in Fig. 2. The resultant stylization was barely able to capture the color information of the two styles. The style loss dominated the total loss during training for combining both styles (Fig. 5), which was indicative of the inability to combine the high- level perceptual features from the VGG network. For ex- Figure 5: Training losses for one style (Picasso) and both styles (Picasso and Monet). The style loss dominated the total loss for combining both styles. This change can be attributed to the failure to rectify the high-level features for both styles. ample, the style for training one style was around the same magnitude as the content loss (Fig. 5). Next, we experimented with the style weighting to see if we could resolve the high style loss issue. We thought that if we increased the style weighting, that the network would learn how to better resolve the discrepancies between the style features of both paintings. The stylizations with sev- eral different style weightings are shown Fig. 7 (default style weight = 1e10). We were not able to achieve a styl- ization of the content image that successfully combined the color and texture from both styles. In fact, when the style weight was increased, triangular-shaped artifacts started to appear within the frame (style weight = 1e11). Therefore, we decided to test our method on two paintings that were visually more similar. The original experiment was reran with two new paint- ings. The two paintings were the Scream by Munch (Style 1) and Starry Night by Van Gogh (Style 2). The stylization of these images is shown in Fig. 8. The resulting styliza- tions successfully combined the colors of the two paintings as before, but also included small textures such as the ”sun- like” objects from Starry Night. This improvement was re- flected in the lower style loss for ”both styles” in Fig. 6 as compared to Fig. 5. We also tested the ability of our network to interpolate between styles using Eq. 6. Our results for style inter-
  • 6. Figure 6: Convergence of total, style, and content losses during training for similar styles (Scream and Starry Night). For the default style weight (1e10), the style loss was much less than for two perceptually very different paintings Fig. 5. This indicated that the network was able to better com- bine high-level features from both paintings. . polation are displayed in Fig. 9. It is clear that α = 0.7 incorporates more color from style 1 while α = 0.3 incor- porates more color from style 2. Further, on the door frame of α = 0.7, there are waves reminiscent of the textures from style 1. Similarly, for α = 0.3, there are more ”sun-like” features that are reminiscent of style 2. While our method for combining multiple styles was successful in capturing details from two perceptually sim- ilar paintings, it failed to rectify differences between two dissimilar paintings. This was likely due to the inability of the high-level features used in VGG network to describe the ”style” in these two paintings in a meaningful way. This provides motivation to use methods described in the related works section where they created a network to se- mantically describe paintings in a higher dimensional space [10]. Further, they papers were able to train their network on many styles, rather than just two in our case. Any of these methods may serve as an inspiration to improve upon our method. 3.2. Style Consistancy In order to check our network for style consistencies we fed the network a stream of the same images after training. Any change in the style element in the output would be con- sidered a not a consistent result. Fig 10 shows the result of how our architecture holds up. Fig 10a shows the output of the original image used as the base line Fig 10b shows the output without any from of correlation performed while Fig 10c shows the result of the correlated network. Each of the images show a zoomed in portion of their top left area in order to display and compare the artifacts properly. It can be seen that Fig 10a and Fig 10c shows pretty sim- ilar results while Fig 10b artifacts deviate a lot. By this we say that the output of the first and the third images are consistent and they do not vary with time. We can loosely claim that the style of our network completely depend on the ”local-content” that it is focused on. If the local-content remains unchanged so do the style. This can further be used a shallow example of time independency of our network. However, it must be declared that the network is really in- consistent. Some of the outputs that we received were not even images, they were just random number. Sometimes it was just a white color and no content. This was one of the very few correct outputs that we had received. The 4. Future Work Another method we thought of that may be able to com- bine style and content information is based upon the work of [6]. This ”distillation” method was intended to combine the information learned from one large network or an en- semble of networks (”learned network”) into one, possible smaller, network (”new” network). Within this paradigm, the ”learned” networks would be the transformer networks trained on fixed individual styles and the ”new” network would be the transformer network to combine several styles. For the style consistencies, a lot of research has to be done to figure out why the outputs are not uniform. Addition- ally, implementation of the LSTM maybe considered as a long term goal or a CRF-RNN network to better preserve consistencies. 4.1. Original Contributions Our first contribution was the idea to combine multiple styles in one training. Secondly, we developed a method to interpolate between styles. Concretely, we coded from scratch the transformer network and function to return the VGG network activations. The transformer network struc- ture we used was the same as Johnson et al [2] 2 . Their sup- plementary material was a useful resource to determine the exact structure of their transformer network. The training script used to train our transformer network was a heavily modified version of the script from the Pytorch examples for real-time style transfer 3 . In particular, we altered the script to mesh with our implementation of the VGG loss 2https://github.com/jcjohnson/fast-neural-style 3https://github.com/pytorch/examples/tree/ master/fast_neural_style
  • 7. Figure 7: This illustrates stylizations similar to Fig. 2, but with different style loss weightings. There is a clear trend that as the style weight increases, the content image is rendered more abstractly. Further, for style loss weighting 1e11, triangular- shaped artifacts start appearing. Figure 8: This depicts a style image rendered for styles 1 and 2. The last column shows the content image rendered using our method for combining multiple styles. This combination rendering captures color information from both styles as well as some textures. For example, there are ”sun-like” patterns that mimic Style 2.
  • 8. Figure 9: This demonstrates the ability to interpolate between styles based upon different weighted contributions to the total style loss as described in Eq. 6. There is a clear transition from more orange on the left to more blue on the right; mimicking the style transition from Scream (style 1) to Starry Night (style 2). Figure 10: Image Consistency results. and transformer network. We also altered the script to al- low for our multiple styles method and style interpolation method. Additionally, we altered it to save training loss, plot training loss (during training), and save a log file of the hyperparameters used. We created bash utility scripts to al- low us to train several networks and stylize several images all at once. We created python scripts using OpenCV to stylize images from a video file and render images from a webcam in real time using our trained models. Every styl- ized image presented in the results section is from a model we trained using our code.
  • 9. References [1] Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. A neural algorithm of artistic style. CoRR, abs/1508.06576, 2015. 1, 3 [2] Justin Johnson, Alexandre Alahi, and Fei-Fei Li. Percep- tual losses for real-time style transfer and super-resolution. CoRR, abs/1603.08155, 2016. 1, 2, 3, 6 [3] Dmitry Ulyanov, Andrea Vedaldi, and Victor S. Lempitsky. Instance normalization: The missing ingredient for fast styl- ization. CoRR, abs/1607.08022, 2016. 1, 3 [4] Xun Huang and Serge J. Belongie. Arbitrary style trans- fer in real-time with adaptive instance normalization. CoRR, abs/1703.06868, 2017. 1, 2 [5] Shuai Zheng, Sadeep Jayasumana, Bernardino Romera- Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, and Philip HS Torr. Conditional random fields as recurrent neural networks. In Proceedings of the IEEE inter- national conference on computer vision, pages 1529–1537, 2015. 1, 2 [6] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distill- ing the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015. 2, 6 [7] Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer pa- rameters and¡ 0.5 mb model size. arXiv preprint arXiv:1602.07360, 2016. 2, 5 [8] Chai Wah Wu. Prodsumnet: reducing model parameters in deep neural networks via product-of-sums matrix decompo- sitions. arXiv preprint arXiv:1809.02209, 2018. 2 [9] Vincent Dumoulin, Jonathon Shlens, and Manjunath Kud- lur. A learned representation for artistic style. CoRR, abs/1610.07629, 2016. 2 [10] Golnaz Ghiasi, Honglak Lee, Manjunath Kudlur, Vincent Dumoulin, and Jonathon Shlens. Exploring the structure of a real-time, arbitrary neural artistic stylization network. CoRR, abs/1705.06830, 2017. 2, 6 [11] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image segmen- tation. In International Conference on Medical image com- puting and computer-assisted intervention, pages 234–241. Springer, 2015. 2, 5 [12] Sepp Hochreiter and J¨urgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997. 2 [13] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 5