Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Stockholm AI study group #1 - Style Transfer

303 views

Published on

Style Transfer - Lars Lowe Sjösund

Published in: Technology
  • Be the first to comment

Stockholm AI study group #1 - Style Transfer

  1. 1. STYLE TRANSFER Lars Lowe Sjösund AI Research Engineer at Peltarion
  2. 2. OVERVIEW 1. Intro style transfer 2. Convolutional Neural Networks 3. Gatys - A Neural Algorithm of Artistic Style 4. Improvements
  3. 3. OVERVIEW 1. Intro style transfer 2. Convolutional Neural Networks 3. Gatys - A Neural Algorithm of Artistic Style 4. Improvements
  4. 4. + = Content Style Desired output Image courtesy: https://github.com/jcjohnson/neural-style STYLE TRANSFER
  5. 5. OVERVIEW 1. Intro style transfer 2. Convolutional Neural Networks 3. Gatys - A Neural Algorithm of Artistic Style 4. Improvements
  6. 6. Image courtesy: Matthieu Cord : Deep CNN and Weak Supervision Learning for visual recognition, https://blog.heuritech.com/2016/02/29/a-brief-report-of-the-heuritech-deep- learning-meetup-5/ HOW DOES A CNN WORK?
  7. 7. 16 32 32 3 Convolution Layer 32x32x3 image width height depth Slide credit: CS231n Lecture 7 Slide courtesy: Johnson, cs231n lecture 7, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
  8. 8. 17 32 32 3 Convolution Layer 5x5x3 filter 32x32x3 image Convolve the filter with the image i.e. “slide over the image spatially, computing dot products” Slide credit: CS231n Lecture 7 Slide courtesy: Johnson, cs231n lecture 7, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
  9. 9. 18 32 32 3 Convolution Layer 5x5x3 filter 32x32x3 image Convolve the filter with the image i.e. “slide over the image spatially, computing dot products” Filters always extend the full depth of the input volume Slide credit: CS231n Lecture 7 Slide courtesy: Johnson, cs231n lecture 7, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
  10. 10. 19 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image (i.e. 5*5*3 = 75-dimensional dot product + bias) Slide credit: CS231n Lecture 7 Slide courtesy: Johnson, cs231n lecture 7, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
  11. 11. 20 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter convolve (slide) over all spatial locations activation map 1 28 28 Slide credit: CS231n Lecture 7 Slide courtesy: Johnson, cs231n lecture 7, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
  12. 12. 21 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter convolve (slide) over all spatial locations activation maps 1 28 28 consider a second, green filter Slide credit: CS231n Lecture 7 Slide courtesy: Johnson, cs231n lecture 7, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
  13. 13. 22 32 32 3 Convolution Layer activation maps 6 28 28 For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps: We stack these up to get a “new image” of size 28x28x6! Slide credit: CS231n Lecture 7 Slide courtesy: Johnson, cs231n lecture 7, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
  14. 14. Image courtesy: http://vision03.csail.mit.edu/cnn_art/data/single_layer.png
  15. 15. OVERVIEW 1. Intro style transfer 2. Convolutional Neural Networks 3. Gatys - A Neural Algorithm of Artistic Style 4. Improvements
  16. 16. + = Content Style Desired output Image courtesy: https://github.com/jcjohnson/neural-style STYLE TRANSFER
  17. 17. + = Content Style Desired output Image courtesy: https://github.com/jcjohnson/neural-style STYLE TRANSFER
  18. 18. Image courtesy: https://github.com/jcjohnson/neural-style
  19. 19. RECONSTRUCTING CONTENT ➤ Given image, how can we find a new one with the same content? ➤ Find content distance measure between images ➤ Start from random noise image ➤ Minimize distance through iteration Image courtesy: D. Ulyanov https://bayesgroup.github.io/bmml_sem/2016/style.pdf
  20. 20. 1. Load a pre-trained CNN (e.g. VGG19) 2. Pass image #1 through the net 3. Save activation maps from conv-layers 4. Pass image #2 through the net 5. Save activation maps from conv-layers 6. Calculate Euclidean distance between activation maps from image #1 and #2 and sum up for all layers CONTENT DISTANCE MEASURE Image courtesy: Gatys et al., Texture Synthesis Using Convolutional Neural Networks, https://arxiv.org/pdf/1505.07376.pdf Lcontent (x, ˆx) = 1 2 wl (Al (x)− Al ( ˆx))2 l ∑ x ˆx
  21. 21. RECONSTRUCTING CONTENT ➤ Start from random image ➤ Update it using gradient descent Lcontent (x, ˆx) = 1 2 wl (Al (x)− Al ( ˆx))2 l ∑ ˆxt+1 = ˆxt − ε ∂Lcontent ∂ ˆx Image courtesy: D. Ulyanov, https://bayesgroup.github.io/bmml_sem/2016/style.pdf
  22. 22. RECONSTRUCTING CONTENT ➤ Start from random image ➤ Update it using gradient descent Lcontent (x, ˆx) = 1 2 wl (Al (x)− Al ( ˆx))2 l ∑ ˆxt+1 = ˆxt − ε ∂Lcontent ∂ ˆx
  23. 23. 55 Reconstructions from intermediate layers Higher layers are less sensitive to changes in color, texture, and shape Mahendran and Vedaldi, “Understanding Deep Image Representations by Inverting Them”, CVPR 2015 Feature Inversion Slide courtesy: Johnson, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
  24. 24. 54 Feature Inversion Reconstructions from the representation after last last pooling layer (immediately before the first Fully Connected layer) Mahendran and Vedaldi, “Understanding Deep Image Representations by Inverting Them”, CVPR 2015 Slide courtesy: Johnson, http://web.stanford.edu/class/cs20si/lectures/slides_06.pdf
  25. 25. + = Content Style Desired output Image courtesy: https://github.com/jcjohnson/neural-style STYLE TRANSFER
  26. 26. + = Content Style Desired output STYLE TRANSFER Image courtesy: https://github.com/jcjohnson/neural-style
  27. 27. Style = Texture / Local structure Ignores global semantic content
  28. 28. STYLE DISTANCE MEASURE ➤ Represent style by Gram matrix - pairwise covariance of activation maps ➤ Just the uncentered covariance matrix between vectorized activation maps Image courtesy: Gatys et al., Texture Synthesis Using Convolutional Neural Networks, https://arxiv.org/pdf/1505.07376.pdf Gij l (x) = ! Ai l (x)i ! Aj l (x) G(A1,A1) … G(A1,An ) ! " ! G(An,A1) # G(An,An ) ⎛ ⎝ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟
  29. 29. STYLE DISTANCE MEASURE Lstyle(x, ˆx) = 1 2 wl (Gl (x)− Gl ( ˆx))2 l ∑ Image courtesy: Gatys et al., Texture Synthesis Using Convolutional Neural Networks, https://arxiv.org/pdf/1505.07376.pdf ➤ Style loss - Euclidean distance between Gram matrices from two images
  30. 30. RECONSTRUCTING STYLE ➤ Start from random image ➤ Update it using gradient descent Lstyle(x, ˆx) = 1 2 wl (Gl (x)− Gl ( ˆx))2 l ∑ ˆxt+1 = ˆxt − ε ∂Lstyle ∂ ˆx Image courtesy: D. Ulyanov https://bayesgroup.github.io/bmml_sem/2016/style.pdf
  31. 31. RECONSTRUCTING STYLE ➤ Start from random image ➤ Update it using gradient descent Lstyle(x, ˆx) = 1 2 wl (Gl (x)− Gl ( ˆx))2 l ∑ ˆxt+1 = ˆxt − ε ∂Lstyle ∂ ˆx
  32. 32. RECONSTRUCTING STYLE Image courtesy: Gatys et al., Texture Synthesis Using Convolutional Neural Networks, https://arxiv.org/pdf/1505.07376.pdf
  33. 33. MATHEMATICAL SIDE NOTE Special case of square of Maximum Mean Discrepancy (MMD) with Further reading: Demystifying Style Transfer, Li et al. Lstyle(x, ˆx) = 1 2 wl (Gl (x)− Gl ( ˆx))2 l ∑ Lstyle l = 1 Zk l MMD2 (Al (x),Al ( ˆx)) = E[φ(Al (x))]− E[φ(Al ( ˆx))] 2 = 1 Zk l (k(A:,i l ,A:, j l )+k( ˆA:,i l , ˆA:, j l ) j=1 Ml ∑ i=1 Ml ∑ + 2k(A:,i l , ˆA:, j l )) k(x, ˆx) = (xT ˆx)2
  34. 34. + = Content Style Desired output STYLE TRANSFER Ltotal (x, ˆx) = αLcontent (x, ˆx)+ βLstyle(x, ˆx) Image courtesy: https://github.com/jcjohnson/neural-style
  35. 35. OVERVIEW 1. Intro style transfer 2. Convolutional Neural Networks 3. Gatys - A Neural Algorithm of Artistic Style 4. Improvements
  36. 36. TOTAL VARIATION LOSS
  37. 37. TOTAL VARIATION LOSS LTV = (vi+1, j − vi, j )2 + (vi, j+1 − vi, j )2 i, j ∑
  38. 38. PERCEPTUAL LOSSES FOR REAL-TIME STYLE TRANSFER AND SUPER-RESOLUTION ➤ Train a network to do the optimization ➤ + Fast ➤ - One network per style ➤ - Quantitatively slightly worse Image courtesy: Johnson et al., Perceptual Losses for Real-Time Style Transfer and Super-Resolution, https://arxiv.org/abs/1603.08155
  39. 39. ARBITRARY STYLE TRANSFER IN REAL-TIME WITH ADAPTIVE INSTANCE NORMALIZATION Image courtesy: Huang et al., Arbitrary style transfer in real-time with adaptive instance normalization AdaIN(xc,xs ) = σ (xs ) xc − µ(xc ) σ (xc ) ⎛ ⎝⎜ ⎞ ⎠⎟ + µ(xs ) ➤ Align mean and variance for activation maps ➤ + Fast (15 fps, 512x512px) ➤ + One net, arbitrary style ➤ - Quantitatively slightly worse
  40. 40. QUESTIONS & DISCUSSION
  41. 41. THANK YOU! Email: lars@peltarion.com Twitter: sjosund

×