Stockholm AI study group #1 - Style Transfer

STYLE TRANSFER
Lars Lowe Sjösund
AI Research Engineer at Peltarion

OVERVIEW
1. Intro style transfer
2. Convolutional Neural Networks
3. Gatys - A Neural Algorithm of Artistic Style
4. Improvements

+ =
Content Style Desired output
Image courtesy: https://github.com/jcjohnson/neural-style
STYLE TRANSFER

Image courtesy: Matthieu Cord : Deep CNN and Weak Supervision Learning for visual recognition, https://blog.heuritech.com/2016/02/29/a-brief-report-of-the-heuritech-deep-
learning-meetup-5/
HOW DOES A CNN WORK?

16
32
32
3
Convolution Layer
32x32x3 image
width
height
depth
Slide credit: CS231n Lecture 7
Slide courtesy: Johnson, cs231n lecture 7, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf

17
32
32
3
Convolution Layer
5x5x3 filter
32x32x3 image
Convolve the filter with the image
i.e. “slide over the image spatially,
computing dot products”

18
32
32
3
Convolution Layer
5x5x3 filter
32x32x3 image
Convolve the filter with the image
i.e. “slide over the image spatially,
computing dot products”
Filters always extend the full
depth of the input volume

19
32
32
3
Convolution Layer
32x32x3 image
5x5x3 filter
1 number:
the result of taking a dot product between the
filter and a small 5x5x3 chunk of the image
(i.e. 5*5*3 = 75-dimensional dot product + bias)

20
32
32
3
Convolution Layer
32x32x3 image
5x5x3 filter
convolve (slide) over all
spatial locations
activation map
1
28
28

21
32
32
3
Convolution Layer
32x32x3 image
5x5x3 filter
convolve (slide) over all
spatial locations
activation maps
1
28
28
consider a second, green filter

22
32
32
3
Convolution Layer
activation maps
6
28
28
For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:
We stack these up to get a “new image” of size 28x28x6!

Image courtesy: http://vision03.csail.mit.edu/cnn_art/data/single_layer.png

RECONSTRUCTING CONTENT
➤ Given image, how can we ﬁnd a
new one with the same content?
➤ Find content distance measure
between images
➤ Start from random noise image
➤ Minimize distance through iteration
Image courtesy: D. Ulyanov https://bayesgroup.github.io/bmml_sem/2016/style.pdf

1. Load a pre-trained CNN (e.g. VGG19)
2. Pass image #1 through the net
3. Save activation maps from conv-layers
4. Pass image #2 through the net
5. Save activation maps from conv-layers
6. Calculate Euclidean distance between
activation maps from image #1 and #2
and sum up for all layers
CONTENT DISTANCE MEASURE
Image courtesy: Gatys et al., Texture Synthesis Using Convolutional Neural Networks, https://arxiv.org/pdf/1505.07376.pdf
Lcontent (x, ˆx) =
1
2
wl (Al (x)− Al ( ˆx))2
l
∑
x ˆx

➤ Start from random image
➤ Update it using gradient descent
Lcontent (x, ˆx) =
1
2
l
∑
ˆxt+1 = ˆxt − ε
∂Lcontent
∂ ˆx
Image courtesy: D. Ulyanov, https://bayesgroup.github.io/bmml_sem/2016/style.pdf

Lcontent (x, ˆx) =
1
2
l
∑
∂Lcontent
∂ ˆx

55
Reconstructions from intermediate layers
Higher layers are less sensitive to changes in
color, texture, and shape
Mahendran and Vedaldi, “Understanding Deep Image Representations by Inverting Them”, CVPR 2015
Feature Inversion
Slide courtesy: Johnson, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf

54
Feature Inversion
Reconstructions from the representation after last last pooling layer
(immediately before the first Fully Connected layer)
Mahendran and Vedaldi, “Understanding Deep Image Representations by Inverting Them”, CVPR 2015
Slide courtesy: Johnson, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf

+ =
STYLE TRANSFER

Style = Texture / Local structure

Ignores global semantic content

STYLE DISTANCE MEASURE
➤ Represent style by Gram matrix - pairwise covariance of activation maps
➤ Just the uncentered covariance matrix between vectorized activation maps
Gij
l
(x) =
!
Ai
l
(x)i
!
Aj
l
(x)
G(A1,A1) … G(A1,An )
! " !
G(An,A1) # G(An,An )
⎛
⎝
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟

STYLE DISTANCE MEASURE
Lstyle(x, ˆx) =
1
2
wl (Gl
(x)− Gl
( ˆx))2
l
∑
➤ Style loss - Euclidean distance between Gram matrices
from two images

RECONSTRUCTING STYLE
Lstyle(x, ˆx) =
1
2
wl (Gl
(x)− Gl
( ˆx))2
l
∑
∂Lstyle
∂ ˆx
Image courtesy: D. Ulyanov https://bayesgroup.github.io/bmml_sem/2016/style.pdf

Lstyle(x, ˆx) =
1
2
wl (Gl
(x)− Gl
( ˆx))2
l
∑
∂Lstyle
∂ ˆx

MATHEMATICAL SIDE NOTE
Special case of square of Maximum Mean Discrepancy (MMD)
with
Further reading: Demystifying Style Transfer, Li et al.
Lstyle(x, ˆx) =
1
2
wl (Gl
(x)− Gl
( ˆx))2
l
∑
Lstyle
l
=
1
Zk
l
MMD2
(Al
(x),Al
( ˆx))
= E[φ(Al
(x))]− E[φ(Al
( ˆx))]
2
=
1
Zk
l
(k(A:,i
l
,A:, j
l
)+k( Â:,i
l
, Â:, j
l
)
j=1
Ml
∑
i=1
Ml
∑ + 2k(A:,i
l
, Â:, j
l
))
k(x, ˆx) = (xT
ˆx)2

+ =
STYLE TRANSFER
Ltotal (x, ˆx) = αLcontent (x, ˆx)+ βLstyle(x, ˆx)

TOTAL VARIATION LOSS
LTV = (vi+1, j − vi, j )2
+ (vi, j+1 − vi, j )2
i, j
∑

PERCEPTUAL LOSSES FOR REAL-TIME STYLE TRANSFER AND SUPER-RESOLUTION
➤ Train a network to do the optimization
➤ + Fast
➤ - One network per style
➤ - Quantitatively slightly worse
Image courtesy: Johnson et al., Perceptual Losses for Real-Time Style Transfer and Super-Resolution, https://arxiv.org/abs/1603.08155

ARBITRARY STYLE TRANSFER IN REAL-TIME WITH ADAPTIVE INSTANCE NORMALIZATION
Image courtesy: Huang et al., Arbitrary style transfer in real-time with adaptive instance normalization
AdaIN(xc,xs ) = σ (xs )
xc − µ(xc )
σ (xc )
⎛
⎝⎜
⎞
⎠⎟ + µ(xs )
➤ Align mean and variance for activation maps
➤ + Fast (15 fps, 512x512px)
➤ + One net, arbitrary style
➤ - Quantitatively slightly worse

THANK YOU!
Email: lars@peltarion.com
Twitter: sjosund

Stockholm AI study group #1 - Style Transfer

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Stockholm AI study group #1 - Style Transfer

Similar to Stockholm AI study group #1 - Style Transfer (20)

Recently uploaded

Recently uploaded (20)

Stockholm AI study group #1 - Style Transfer