Perception and Intelligence Laboratory
Seoul
National
University
Seminar
- Understanding Neural Representation
Junho Cho
16/02/05
• Texture Synthesis Using Convolutional Neural Networks
• NIPS2015
• Understanding Deep Image Representations by Inverting Them
• CVPR2015
• A Neural Algorithm of Artistic Style
Perception and Intelligence Lab., Copyright © 2015 2
Introduction
Mainly focusing on
• Texture Synthesis Using Convolutional Neural Networks
• NIPS2015
• Novel method of texture generation.
+
A bit of
• Understanding Deep Image Representations by Inverting Them
• CVPR2015
• Reconstruct the input image from feature maps.
And
• A Neural Algorithm of Artistic Style
• Synthesize content of a photo and style of an artwork.
Perception and Intelligence Lab., Copyright © 2015 3
Two Preliminaries and the collaboration
• Picture of yours and Style Van Gogh
Perception and Intelligence Lab., Copyright © 2015 4
AI painter
Texture Synthesis Using
Convolutional Neural Networks
-NIPS2015
Part 01.
Leon A. Gatys, Alexander S. Ecker, Matthias Bethge
• Infer a generating process from an example texture.
• Produce arbitrarily many new samples of that texture.
• Evaluation by human inspection. Success if human can’t tell difference.
Perception and Intelligence Lab., Copyright © 2015 6
Visual texture synthesis
Perception and Intelligence Lab., Copyright © 2015 7
Results of this work
Two main approaches to find texture generating process
1. Non-parametric texture model
• Generate a new texture by resampling (pixels | whole patches of original texture).
• High quality natural textures very efficiently.
• But, mechanistic procedure, randomizing a source texture without perceptive model.
2. Parametric texture model
• Statistical measurements on image, texture defined by the measurements.
• Generate new samples of a texture of same measurements of original.
• Visual texture can be uniquely described by the Nth-order joint histograms of its pixels.
• B. Julesz. Visual Pattern Discrimination. IRE Transactions on Information Theory, 8(2), February 1962.
• Texture models, inspired by the linear response
properties of the mammalian early visual system.
• Statistical measurements on filter responses(Gabor) rather than direct pixels
This work is parametric texture model.
Perception and Intelligence Lab., Copyright © 2015 8
Introduction
Perception and Intelligence Lab., Copyright © 2015 9
Demo
Previous method, fails in those textures.Current method
Perception and Intelligence Lab., Copyright © 2015 10
Demo
• code
Perception and Intelligence Lab., Copyright © 2015 11
Demo
• Propose parametric texture model
• Not a model of the early visual system. Link
• Using CNN – functional model for the entire ventral stream
• Texture model, parameterized by spatially invariant representations,
built on the hierarchical processing architecture of CNN,
• Better qualitative result.
Perception and Intelligence Lab., Copyright © 2015 12
Contribution
"Convolutional Networks: Unleashing the Potential of Machine Learning for Robust
Perception Systems," a Presentation from Yann LeCun of Facebook and NYU
VGG-19 network
• 16 conv, 5 pool.
Do not use Fully Connected layers
• Use Object recognition of ImageNet
trained
• CNN for Object recognition is also
capable of capturing texture.
• MAX  AVG pooling
• Better gradient flow
• Weight rescale.
• Mean activation of each filter over
images and positions is equal to one.
Perception and Intelligence Lab., Copyright © 2015 13
CNN
• Compute spatial
summary statistic on the
feature responses to obtain
• Find new image with the
same stationary
description by performing
gradient descent.
• Discard spatial information
in the feature maps, by
correlations between
feature maps.
• Gram matrix
Perception and Intelligence Lab., Copyright © 2015 14
Texture model
Perception and Intelligence Lab., Copyright © 2015 15
Gram matrices as Texture feature
• Gram Matrix
• 𝐹 𝑙 ∈ ℛ 𝑁 𝑙×𝑀 𝑙
𝐹 𝑙: Feature maps of layer 𝑙
𝑁𝑙: # of feature maps
𝑀𝑙: each size of feature map
𝐹𝑗𝑘
𝑙
: activation of the 𝑗 𝑡ℎfilter
at position 𝑘 in layer 𝑙
• In the example,
𝑙: conv3_1
𝑀𝑙: 56
𝑁𝑙: 256
A set of Gram matrices {𝐺1, 𝐺2, … , 𝐺 𝐿} for some layers 1, … , 𝐿 in the network
in response to a given texture provides a description of the texture.
Only convolutions
• Correlation of filter response has been used as texture feature.
• Visual texture can be uniquely described by the Nth-order joint histograms of its pixels.
• B. Julesz. Visual Pattern Discrimination. IRE Transactions on Information Theory, 8(2), February 1962.
• D. J. Heeger and J. R. Bergen. Pyramid-based Texture Analysis/Synthesis. In Proceedings of the 22Nd Annual
Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’95, pages 229–238, New York, NY,
USA, 1995. ACM.
• Compared to previous method
• Texture feature from
Linear filter bank  Feature space by DNN
• Correlations between feature responses in each layer of the network.
• Texture model: agnostic to spatial information
• To do this, compute correlations between the responses of feature maps.
Perception and Intelligence Lab., Copyright © 2015 16
Why correlation defines texture?
𝑥: input image, 𝑥: generated image
𝐺 𝑙, 𝐺 𝑙: respective Gram matrix
To generate a new texture on 𝑥,
use gradient descent from a
white noise image to find
another image matches the
Gram matrix representation
of the original image
Perception and Intelligence Lab., Copyright © 2015 17
Texture generation
To be sure, we are not training CNN model,
CNN weights are fixed.
We are optimizing(training) 𝑥 that has
similar texture(Gram matrices) of 𝑥
• The derivative of 𝐸𝑙 with
respect to the activations in
layer 𝑙 can be computed
analytically:
Perception and Intelligence Lab., Copyright © 2015 18
Texture generation – Gradient descent
• The gradients of 𝐸𝑙, with respect
to the pixels 𝑥 can be computed
using standard back propagation.
• The gradient
𝜕ℒ
𝜕 𝑥
, input for
numerical optimization strategy
• L-BFGS
• High-dimensional
optimization problem
• Forward-backward pass that
is used to train CNN
• Very complex model
• But computation
with GPUs.
Perception and Intelligence Lab., Copyright © 2015 19
Texture generation – Gradient descent
Perception and Intelligence Lab., Copyright © 2015 20
Result
• Each label layer includes all layers below it.
• Ex) pool4 includes pool1,2,3,4
• Constraining to only low layers
 Similar to noise.
• Increasing number of layers
Increase degrees of naturalness
• Above ‘pool4’, didn’t improve
• Last column shows only local spatial
information preserved.
• Receptive size of Deep CNN
More results link
• Yellow box on boundary
• This model discards global spatial information
• Possible reasons:
• Some features encode information at the image boundary
• Zero-padding?
Perception and Intelligence Lab., Copyright © 2015 21
More analysis
A
Can reduce some parameters
B
Experiment on CaffeNet
Not better than VGG
Some grid found:
• stride?
• Larger size?
C
Random weighted VGG doesn’t work
- Importance ImageNet
pretrained model
Perception and Intelligence Lab., Copyright © 2015 22
More analysis
• Test texture features how they understand context of object.
• Gram-matrix representation still predict object identity.
• Texture still has high-level information.
• Texture doesn’t necessarily preserve the global structure of objects.
• Might provide an insight into how CNNs encode object identity.
Perception and Intelligence Lab., Copyright © 2015 23
Texture representation on Object recognition.
• A new parametric texture model
• Computationally more expensive, but based on CNN.
• Any progress in deep CNN is transferable to texture synthesis method.
• Computing Gram matrices, transforming the CNN representation into
stationary feature space increases performance.
• Another usage of stationary feature: SppNet
• Improved object recognition and detection.
• Texture model inspired by biological vision.
(Point of view in Neuroscience)
• Hierarchical architecture and basic computational properties
similar to real neural system.
• Very undistinguishable between original texture and synthesized texture.
• Compelling candidate models for studying visual information processing in the
brain.
Perception and Intelligence Lab., Copyright © 2015 24
Discussion
Understanding Deep Image
Representations by Inverting Them
-CVPR2015
Part 02.
Aravindh Mahendran, Andrea Vedaldi (VGGgroup)
Perception and Intelligence Lab., Copyright © 2015 26
Reconstruction from feature map
Gradient descent
Perception and Intelligence Lab., Copyright © 2015 27
Reconstruction from each layers in CNN
Perception and Intelligence Lab., Copyright © 2015 28
Receptive field &
Deep layer still contains rich information
A Neural Algorithm of Artistic Style
Part 03.
Leon A. Gatys, Alexander S. Ecker, Matthias Bethge
• Fine art had been exclusive only to humans
• Only human is able to interplay between content and style of an image.
• No artificial system had been capable of it so far.
• NeuralArt offers a path forward to an algorithmic understanding of how
humans create and perceive artistic imagery.
Perception and Intelligence Lab., Copyright © 2015 30
Introduction
Obtain content from
• Understanding Deep Image Representations by Inverting Them
Obtain a representation of the style from
• Texture Synthesis Using Convolutional Neural Networks
Mix two representations!
Perception and Intelligence Lab., Copyright © 2015 31
How?
• High level content in higher layer,
and do not constrain the exact pixel values in reconstruction.
• Obtain style, using originally
designed to capture texture information
• Creates images that match the style of
a given image on an increasing
scale while discarding
information of the global
arrangement of the scene.
Perception and Intelligence Lab., Copyright © 2015 32
How?
Almost same as input in pixel level Still retain rich
information
• Representations of content and style in CNN are separable
• Manipulate both representations independently.
• Mix the content and style representation from two different source images.
• Images are synthesized by finding an image that simultaneously matches :
1. The content representation of the phtoto
2. The style representation of the artwork
Thu, while global arrangement of the original photo preserved, the color & local
structures are from the artwork.
Perception and Intelligence Lab., Copyright © 2015 33
Intuition
Perception and Intelligence Lab., Copyright © 2015 34
Method
𝑝: original photo, 𝑎: original artwork
𝑥: image that is generated
• The gradients of 𝐸𝑙, with respect
to the pixels 𝑥 can be computed
using standard back propagation.
• The gradient
𝜕ℒ
𝜕 𝑥
, input for
numerical optimization strategy
• L-BFGS
• High-dimensional
optimization problem
• Forward-backward pass that
is used to train CNN
• Very complex model
• But computation
with GPUs.
Perception and Intelligence Lab., Copyright © 2015 35
Previously on Texture synthesis….
Perception and Intelligence Lab., Copyright © 2015 36
Results
Deeper the layer
Local image structure captured by style
representation increases
due to receptive field increases
and more feature complexity in hierarchy
𝛼/𝛽 increase: more content focus
Thank you
Chapter 01.
Perception and Intelligence Lab., Copyright © 2015 38
• The second idea, the new one, is to avoid data-fragmentation by using pose-
specific classifiers trained with “stationary features”, a generalization of the
underlying implicit parametrization of the features by a scale and a location in
all the discriminative learning techniques mentioned earlier. Each stationary
feature is “pose-indexed” in the sense of assigning a numerical value to each
combination of an image and a pose (or subset of poses).
Perception and Intelligence Lab., Copyright © 2015 39
Method dataset Measure 1 Measure 2 Measure 3 Measure 4
Baseline ABC 92 12 34 45
XXX ABC 32 32 54 76
YYY ABC 14 14 12 98
ZZZ ABC 32 23 32 67
Proposed ABC 14 42 41 87
Proposed (w.XX) ABC 32 15 35 67
Perception and Intelligence Lab., Copyright © 2015 40
Table example
Table Title (if you want it to place here)
Perception and Intelligence Lab., Copyright © 2015 41
Figure example
< Updated cells > < CNN architecture >
For highlight

160205 NeuralArt - Understanding Neural Representation

  • 1.
    Perception and IntelligenceLaboratory Seoul National University Seminar - Understanding Neural Representation Junho Cho 16/02/05
  • 2.
    • Texture SynthesisUsing Convolutional Neural Networks • NIPS2015 • Understanding Deep Image Representations by Inverting Them • CVPR2015 • A Neural Algorithm of Artistic Style Perception and Intelligence Lab., Copyright © 2015 2 Introduction
  • 3.
    Mainly focusing on •Texture Synthesis Using Convolutional Neural Networks • NIPS2015 • Novel method of texture generation. + A bit of • Understanding Deep Image Representations by Inverting Them • CVPR2015 • Reconstruct the input image from feature maps. And • A Neural Algorithm of Artistic Style • Synthesize content of a photo and style of an artwork. Perception and Intelligence Lab., Copyright © 2015 3 Two Preliminaries and the collaboration
  • 4.
    • Picture ofyours and Style Van Gogh Perception and Intelligence Lab., Copyright © 2015 4 AI painter
  • 5.
    Texture Synthesis Using ConvolutionalNeural Networks -NIPS2015 Part 01. Leon A. Gatys, Alexander S. Ecker, Matthias Bethge
  • 6.
    • Infer agenerating process from an example texture. • Produce arbitrarily many new samples of that texture. • Evaluation by human inspection. Success if human can’t tell difference. Perception and Intelligence Lab., Copyright © 2015 6 Visual texture synthesis
  • 7.
    Perception and IntelligenceLab., Copyright © 2015 7 Results of this work
  • 8.
    Two main approachesto find texture generating process 1. Non-parametric texture model • Generate a new texture by resampling (pixels | whole patches of original texture). • High quality natural textures very efficiently. • But, mechanistic procedure, randomizing a source texture without perceptive model. 2. Parametric texture model • Statistical measurements on image, texture defined by the measurements. • Generate new samples of a texture of same measurements of original. • Visual texture can be uniquely described by the Nth-order joint histograms of its pixels. • B. Julesz. Visual Pattern Discrimination. IRE Transactions on Information Theory, 8(2), February 1962. • Texture models, inspired by the linear response properties of the mammalian early visual system. • Statistical measurements on filter responses(Gabor) rather than direct pixels This work is parametric texture model. Perception and Intelligence Lab., Copyright © 2015 8 Introduction
  • 9.
    Perception and IntelligenceLab., Copyright © 2015 9 Demo Previous method, fails in those textures.Current method
  • 10.
    Perception and IntelligenceLab., Copyright © 2015 10 Demo
  • 11.
    • code Perception andIntelligence Lab., Copyright © 2015 11 Demo
  • 12.
    • Propose parametrictexture model • Not a model of the early visual system. Link • Using CNN – functional model for the entire ventral stream • Texture model, parameterized by spatially invariant representations, built on the hierarchical processing architecture of CNN, • Better qualitative result. Perception and Intelligence Lab., Copyright © 2015 12 Contribution "Convolutional Networks: Unleashing the Potential of Machine Learning for Robust Perception Systems," a Presentation from Yann LeCun of Facebook and NYU
  • 13.
    VGG-19 network • 16conv, 5 pool. Do not use Fully Connected layers • Use Object recognition of ImageNet trained • CNN for Object recognition is also capable of capturing texture. • MAX  AVG pooling • Better gradient flow • Weight rescale. • Mean activation of each filter over images and positions is equal to one. Perception and Intelligence Lab., Copyright © 2015 13 CNN
  • 14.
    • Compute spatial summarystatistic on the feature responses to obtain • Find new image with the same stationary description by performing gradient descent. • Discard spatial information in the feature maps, by correlations between feature maps. • Gram matrix Perception and Intelligence Lab., Copyright © 2015 14 Texture model
  • 15.
    Perception and IntelligenceLab., Copyright © 2015 15 Gram matrices as Texture feature • Gram Matrix • 𝐹 𝑙 ∈ ℛ 𝑁 𝑙×𝑀 𝑙 𝐹 𝑙: Feature maps of layer 𝑙 𝑁𝑙: # of feature maps 𝑀𝑙: each size of feature map 𝐹𝑗𝑘 𝑙 : activation of the 𝑗 𝑡ℎfilter at position 𝑘 in layer 𝑙 • In the example, 𝑙: conv3_1 𝑀𝑙: 56 𝑁𝑙: 256 A set of Gram matrices {𝐺1, 𝐺2, … , 𝐺 𝐿} for some layers 1, … , 𝐿 in the network in response to a given texture provides a description of the texture. Only convolutions
  • 16.
    • Correlation offilter response has been used as texture feature. • Visual texture can be uniquely described by the Nth-order joint histograms of its pixels. • B. Julesz. Visual Pattern Discrimination. IRE Transactions on Information Theory, 8(2), February 1962. • D. J. Heeger and J. R. Bergen. Pyramid-based Texture Analysis/Synthesis. In Proceedings of the 22Nd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’95, pages 229–238, New York, NY, USA, 1995. ACM. • Compared to previous method • Texture feature from Linear filter bank  Feature space by DNN • Correlations between feature responses in each layer of the network. • Texture model: agnostic to spatial information • To do this, compute correlations between the responses of feature maps. Perception and Intelligence Lab., Copyright © 2015 16 Why correlation defines texture?
  • 17.
    𝑥: input image,𝑥: generated image 𝐺 𝑙, 𝐺 𝑙: respective Gram matrix To generate a new texture on 𝑥, use gradient descent from a white noise image to find another image matches the Gram matrix representation of the original image Perception and Intelligence Lab., Copyright © 2015 17 Texture generation To be sure, we are not training CNN model, CNN weights are fixed. We are optimizing(training) 𝑥 that has similar texture(Gram matrices) of 𝑥
  • 18.
    • The derivativeof 𝐸𝑙 with respect to the activations in layer 𝑙 can be computed analytically: Perception and Intelligence Lab., Copyright © 2015 18 Texture generation – Gradient descent
  • 19.
    • The gradientsof 𝐸𝑙, with respect to the pixels 𝑥 can be computed using standard back propagation. • The gradient 𝜕ℒ 𝜕 𝑥 , input for numerical optimization strategy • L-BFGS • High-dimensional optimization problem • Forward-backward pass that is used to train CNN • Very complex model • But computation with GPUs. Perception and Intelligence Lab., Copyright © 2015 19 Texture generation – Gradient descent
  • 20.
    Perception and IntelligenceLab., Copyright © 2015 20 Result • Each label layer includes all layers below it. • Ex) pool4 includes pool1,2,3,4 • Constraining to only low layers  Similar to noise. • Increasing number of layers Increase degrees of naturalness • Above ‘pool4’, didn’t improve • Last column shows only local spatial information preserved. • Receptive size of Deep CNN More results link
  • 21.
    • Yellow boxon boundary • This model discards global spatial information • Possible reasons: • Some features encode information at the image boundary • Zero-padding? Perception and Intelligence Lab., Copyright © 2015 21 More analysis
  • 22.
    A Can reduce someparameters B Experiment on CaffeNet Not better than VGG Some grid found: • stride? • Larger size? C Random weighted VGG doesn’t work - Importance ImageNet pretrained model Perception and Intelligence Lab., Copyright © 2015 22 More analysis
  • 23.
    • Test texturefeatures how they understand context of object. • Gram-matrix representation still predict object identity. • Texture still has high-level information. • Texture doesn’t necessarily preserve the global structure of objects. • Might provide an insight into how CNNs encode object identity. Perception and Intelligence Lab., Copyright © 2015 23 Texture representation on Object recognition.
  • 24.
    • A newparametric texture model • Computationally more expensive, but based on CNN. • Any progress in deep CNN is transferable to texture synthesis method. • Computing Gram matrices, transforming the CNN representation into stationary feature space increases performance. • Another usage of stationary feature: SppNet • Improved object recognition and detection. • Texture model inspired by biological vision. (Point of view in Neuroscience) • Hierarchical architecture and basic computational properties similar to real neural system. • Very undistinguishable between original texture and synthesized texture. • Compelling candidate models for studying visual information processing in the brain. Perception and Intelligence Lab., Copyright © 2015 24 Discussion
  • 25.
    Understanding Deep Image Representationsby Inverting Them -CVPR2015 Part 02. Aravindh Mahendran, Andrea Vedaldi (VGGgroup)
  • 26.
    Perception and IntelligenceLab., Copyright © 2015 26 Reconstruction from feature map Gradient descent
  • 27.
    Perception and IntelligenceLab., Copyright © 2015 27 Reconstruction from each layers in CNN
  • 28.
    Perception and IntelligenceLab., Copyright © 2015 28 Receptive field & Deep layer still contains rich information
  • 29.
    A Neural Algorithmof Artistic Style Part 03. Leon A. Gatys, Alexander S. Ecker, Matthias Bethge
  • 30.
    • Fine arthad been exclusive only to humans • Only human is able to interplay between content and style of an image. • No artificial system had been capable of it so far. • NeuralArt offers a path forward to an algorithmic understanding of how humans create and perceive artistic imagery. Perception and Intelligence Lab., Copyright © 2015 30 Introduction
  • 31.
    Obtain content from •Understanding Deep Image Representations by Inverting Them Obtain a representation of the style from • Texture Synthesis Using Convolutional Neural Networks Mix two representations! Perception and Intelligence Lab., Copyright © 2015 31 How?
  • 32.
    • High levelcontent in higher layer, and do not constrain the exact pixel values in reconstruction. • Obtain style, using originally designed to capture texture information • Creates images that match the style of a given image on an increasing scale while discarding information of the global arrangement of the scene. Perception and Intelligence Lab., Copyright © 2015 32 How? Almost same as input in pixel level Still retain rich information
  • 33.
    • Representations ofcontent and style in CNN are separable • Manipulate both representations independently. • Mix the content and style representation from two different source images. • Images are synthesized by finding an image that simultaneously matches : 1. The content representation of the phtoto 2. The style representation of the artwork Thu, while global arrangement of the original photo preserved, the color & local structures are from the artwork. Perception and Intelligence Lab., Copyright © 2015 33 Intuition
  • 34.
    Perception and IntelligenceLab., Copyright © 2015 34 Method 𝑝: original photo, 𝑎: original artwork 𝑥: image that is generated
  • 35.
    • The gradientsof 𝐸𝑙, with respect to the pixels 𝑥 can be computed using standard back propagation. • The gradient 𝜕ℒ 𝜕 𝑥 , input for numerical optimization strategy • L-BFGS • High-dimensional optimization problem • Forward-backward pass that is used to train CNN • Very complex model • But computation with GPUs. Perception and Intelligence Lab., Copyright © 2015 35 Previously on Texture synthesis….
  • 36.
    Perception and IntelligenceLab., Copyright © 2015 36 Results Deeper the layer Local image structure captured by style representation increases due to receptive field increases and more feature complexity in hierarchy 𝛼/𝛽 increase: more content focus
  • 37.
  • 38.
    Perception and IntelligenceLab., Copyright © 2015 38
  • 39.
    • The secondidea, the new one, is to avoid data-fragmentation by using pose- specific classifiers trained with “stationary features”, a generalization of the underlying implicit parametrization of the features by a scale and a location in all the discriminative learning techniques mentioned earlier. Each stationary feature is “pose-indexed” in the sense of assigning a numerical value to each combination of an image and a pose (or subset of poses). Perception and Intelligence Lab., Copyright © 2015 39
  • 40.
    Method dataset Measure1 Measure 2 Measure 3 Measure 4 Baseline ABC 92 12 34 45 XXX ABC 32 32 54 76 YYY ABC 14 14 12 98 ZZZ ABC 32 23 32 67 Proposed ABC 14 42 41 87 Proposed (w.XX) ABC 32 15 35 67 Perception and Intelligence Lab., Copyright © 2015 40 Table example Table Title (if you want it to place here)
  • 41.
    Perception and IntelligenceLab., Copyright © 2015 41 Figure example < Updated cells > < CNN architecture > For highlight